Methods for modeling multi-dimensional domains using information theory to resolve gaps in data and in theories

ABSTRACT

Disclosed are methods for modeling multi-dimensional domains by merging multiple input data sets into a model, applying multiple dynamic theories to evolve the model, and using information theory to resolve gaps in, and discrepancies among, the data sets and the theories. One example is a three-dimensional geologic basin simulator that integrates seismic inversion techniques with other data to predict fracture location and characteristics. The geologic simulator delineates the effects of regional tectonics, petroleum-derived overpressure, and salt tectonics and constructs maps of high-grading zones of fracture producibility. A second example is a living cell simulator that uses chemical kinetic rate laws of transcription and translation polymerization to compute mRNA and protein populations as they occur autonomously, in response to changes in the surroundings, or from injected viruses or chemical factors. Features such as the eukaryotic nucleus are treated with a novel mesoscopic reaction-transport theory. Metabolic reactions take place in appropriate compartments.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] The present application is a continuation-in-part of U.S. patentapplication Ser. No. 09/818,752, filed on Mar. 27, 2001, and also claimsthe benefit of U.S. Provisional Patent Application No. 60/254,433, filedon Dec. 8, 2000, which is hereby incorporated in its entirety byreference.

TECHNICAL FIELD

[0002] The present invention relates generally to multi-dimensionalmodeling, and, more particularly, to modeling using information theoryto resolve gaps in available data and theories.

BACKGROUND OF THE INVENTION

[0003] The benefits of modeling complex, multi-dimensional domains havelong been known. For example, accurate models of geologic domainsenhance petroleum extraction while minimizing exploration and productioncosts. Dynamic models of living cells provide insight into cellularbehavior and are useful for predicting the effects of pharmaceuticalsand optimizing treatment strategies. Modem sampling and measurementtechniques provide a wealth of data sets but are usually only indirectlyrelated to the required input for these models. Physical and chemicaltheories have the potential to show how the system modeled should evolvethrough time and across space. Furthermore, in a number of applicationsthere are a variety of types of data of varying quality which could, inprinciple, be used to constrain models if an objective approach toevaluating and integrating these data with the models were available.

[0004] However, rarely are a complete set of input data and dynamictheories available to the modeler. As a first example of thisincompleteness, consider models used by the petroleum industry. Interestin the remote detection of fractures in tight geologic reservoirs hasgrown as new discoveries of oil and natural gas from conventionalreservoirs have declined. The trend in remote detection is to invertseismic data. The problem is that such an inversion may not be possiblein principle because a variety of fluid/rock states (grain size, shape,and packing for all minerals; fracture network statistics; and porosity,wetting, saturation, and composition of each fluid phase) yield the samelog or seismic response. For example, in an azimuthally anisotropicmedium, the principal directions of azimuthal anisotropy are thedirections along which the compressional and shear waves propagate. Ifanisotropy is due solely to fractures, anisotropy data can be used tostudy dominant fracture orientations. However, observed rose diagramsshow that in most cases a fracture network consists of many intersectingfracture orientations. Geochemical data (pore fluid composition, fluidinclusion analyses, and vitrinite reflectance) are often ambiguousindicators of geological history due to variations in pore-fluidcomposition and temperature during basin evolution. Furthermore, theinterpretation of well log and geochemical data is labor-intensive.Therefore, the maximum benefits of these data are often not realized.

[0005] A complete exploration and production (E&P) model characterizinga fractured reservoir requires a large number of descriptive variables(fracture density, length, aperture, orientation, and connectivity).However, remote detection techniques are currently limited to theprediction of a small number of variables. Some techniques use amplitudevariation with offsets to predict fracture orientations. Othersdelineate zones of large Poisson's ratio contrasts which correspond tohigh fracture densities. Neural networks have been used to predictfracture density. Porosity distribution may be predicted through theinversion of multicomponent, three-dimensional (3-D) seismic data. Thesepredictive techniques are currently at best limited to a few fracturenetwork properties. Most importantly, these results only hold if themedium is simpler than a typical reservoir. For example, they may workif there is one fracture orientation and no inherent anisotropy due tosediment lamination or other inhomogeneity and anisotropy.

[0006] Difficulties with remote fracture detection come from the manyfactors affecting mechanical wave speed and attenuation including:

[0007] porosity and texture of unfractured rock;

[0008] density and phases of pore- and fracture-filling fluids;

[0009] fracture length and aperture statistics and connectivity;

[0010] fracture orientation relative to the propagation direction;

[0011] fracture cement infilling volume, mineralogy, and texture;

[0012] pressure and temperature; and

[0013] grain size and shape distribution.

[0014] These variables cannot be extracted from the speed andattenuation of reflected or transmitted seismic waves, even when thevarious polarizations and shear vs. compression components areseparately monitored. Thus, direct remote detection cannot provideenough information to unambiguously identify and characterize fracturesweetspots.

[0015] The petroleum industry requires information about theproducibility of fracture networks: cement infilling; geometry,connectivity, density, and preferred orientation as well as parametersfor dual porosity/dual permeability reservoir models; stress andreservoir sensitivity to pressure drawdown; petroleum content of thematrix; and fractures. While desirable for optimal exploration andpetroleum field development, this level of detailed characterization isfar beyond available remote detection methodologies.

[0016] Models of geological basins or reservoirs require a host of inputparameters and have incomplete physical theories underlying them. Dataare usually fraught with errors and are sparse in space and time. Whatis needed is a procedure that can combine the data and models in orderto overcome the shortcomings in both and which can be used to makequantitative predictions of resource location and characteristics and toestimate uncertainties in these predictions.

[0017] Living cells are a second domain where modelers work withincomplete data sets and incomplete dynamic theories. The complexity ofthe bio-chemical, bioelectric, and mechanical processes underlying cellbehavior makes the design of drugs and treatment strategies extremelydifficult. Furthermore, the cell must be understood as a totality. Forexample, a cell model should be able to predict whether the activity ofa chemical agent targeted to a given cell process could be thwarted bythe existence of an alternative biochemical pathway or could lead tounwanted changes to other necessary processes. While many individualcellular processes are well understood, the coupling among theseprocesses should be accounted for in order to understand the fulldynamics of the cell. As the laws yielding the evolution of a cellularsystem are nonlinear in the descriptive variables (concentrations,numbers of macromolecules of various types, electric potential), a hostof nonlinear phenomena (e.g., multiple steady states, periodic orchaotic temporal evolution and self-organization) are findamentalcharacteristics of cell behavior and therefore a comprehensive, fullycoupled process model should be used to capture them.

[0018] In geologic, biologic, and other modeling, what is needed is away to merge multiple types of input data sets into a model and to usecomprehensive (multiple process) dynamic theories to evolve the modelall the while resolving gaps in, and discrepancies among, the data setsand the theories.

SUMMARY OF THE INVENTION

[0019] The above problems and shortcomings, and others, are addressed bythe present invention, which can be understood by referring to thespecification, drawings, and claims. The present invention modelsmulti-dimensional domains based on multiple, possibly incomplete andmutually incompatible, input data sets. The invention then usesmultiple, possibly incomplete and mutually incompatible, theories toevolve the models through time and across space. Information theoryresolves gaps and conflicts in and among the data sets and theories,thus constraining the ensemble of possible processes and data values.Furthermore, as the information theory approach is based on probabilitytheory, the approach allows for the assessment of uncertainty in thepredictions.

[0020] One embodiment of the invention is a 3-D geologic basin simulatorthat integrates seismic inversion techniques with other data to predictfracture location and characteristics. The 3-D finite-element basinreaction, transport, mechanical simulator includes a rock rheology thatintegrates continuous poroelastic/viscoplastic, pressure solutionsdeformation with brittle deformation (fracturing, failure). Mechanicalprocesses are used to coevolve deformation with multi-phase flow,petroleum generation, mineral reactions, and heat transfer to predictthe location and producibility of fracture sweet spots. Informationtheory uses the geologic basin simulator predictions to integrate welllog, surface, and core data with the otherwise incomplete seismic data.The geologic simulator delineates the effects of regional tectonics,petroleum-derived overpressure, and salt tectonics and constructs mapsof high-grading zones of fracture producibility.

[0021] In a second embodiment, the invention models a living cell. Thecell simulator uses a DNA nucleotide sequence as input. Through chemicalkinetic rate laws of transcription and translation polymerization, thecell simulator computes mRNA and protein populations as they evolveautonomously, in response to changes in the surroundings, or frominjected viruses or chemical factors. Rules relating amino acid sequenceand function and the chemical kinetics of post-translational proteinmodification enable the cell simulator to capture a cell's autonomousbehavior. A full suite of biochemical processes (including glycolysis,the citric acid cycle, amino acid and nucleotide synthesis) areaccounted for with chemical kinetic laws. Features, such as theprokaryotic nucleoid and eukaryotic nucleus, are treated with a novelmesoscopic reaction-transport theory that captures atomic scale detailsand corrections to thermodynamics due to the large concentrationgradients involved. Metabolic reactions and DNA/RNA/protein synthesistake place in appropriate compartments, while the cell simulatoraccounts for active and passive molecular exchange among compartments.

RIEF DESCRIPTION OF THE DRAWINGS

[0022] While the appended claims set forth the features of the presentinvention with particularity, the invention, together with its objectsand advantages, may be best understood from the following detaileddescription taken in conjunction with the accompanying drawings ofwhich:

[0023]FIG. 1 is a schematic flow chart of the Simulation-EnhancedFracture Detection data modeling/integration approach to geologicbasins;

[0024]FIG. 2 is a table of the “laboratory” basins for use in reaction,transport, mechanical (RTM) model testing;

[0025]FIG. 3 shows the complex network of coupled processes thatunderlie the dynamics of a sedimentary basin;

[0026]FIG. 4a depicts the fluid pressuring, fracturing, and fracturehealing feedback cycle;

[0027]FIG. 4b shows the predicted evolution of overpressure at thebottom of the Ellenburger Formation;

[0028]FIG. 5 shows predicted cross-sections of permeability from asimulation of the Piceance Basin in Colorado;

[0029]FIGS. 6a and 6 b show how simulations produced by Basin RTM agreewith observations from the Piceance Basin;

[0030]FIG. 6a shows present-day fluid pressure and least compressivestress;

[0031]FIG. 6b shows that, in sandstones, lateral stress and fluidpressures are found to be similar, indicating their vulnerability tofracturing;

[0032]FIG. 6c predicts natural gas saturation;

[0033]FIG. 7 shows predicted rose diagrams for the Piceance Basin;

[0034]FIGS. 8a and 8 b are simulations of the Piceance Basin;

[0035]FIG. 8a shows an isosurface of overpressure (15 bars) toned withdepth;

[0036]FIG. 8b shows that the distribution of fracture length reflectslithologic variation and the topography imposed by the basementtectonics;

[0037]FIGS. 9a and 9 b show Basin RTM's predictions of fault-generatedfractures and their relation to the creation of fracture-mediatedcompartments and flow;

[0038]FIG. 10 is a simulated time sequence of oil saturation overlying arising salt dome;

[0039]FIG. 11 is a simulation of subsalt oil;

[0040]FIG. 12 is a simulated quarter section of a salt diapir;

[0041]FIG. 13 is a flow chart showing how the interplay of geologic dataand RTM process modules evolve a basin over each computational timeinterval;

[0042]FIG. 14 shows a prediction of Andector Field fractures;

[0043]FIG. 15 is a table of input data available for the Illinois Basin;

[0044]FIG. 16 shows a simulation of the Illinois Basin; data from theIllinois Basin have been used to simulate permeability (shown) and otherimportant reservoir parameters;

[0045]FIG. 17 shows the 3-D stratigraphy of the Illinois Basin;

[0046]FIG. 18 is a map of the Texas Gulf coastal plain showing locationsof the producing Austin Chalk trend and Giddings and Pearsall Fields;

[0047]FIG. 19 is a map of producing and explored wells along the AustinChalk trend;

[0048]FIG. 20 is a generalized cross-section through the East TexasBasin;

[0049]FIG. 21a is a cross-section of the Anadarko Basin showing majorformations and a basin-scale compartment surrounded by alithology-crossing top seal, fault, and the Woodford Shale;

[0050]FIGS. 21b, 21 c, and 21 d are 3-D views of the Anadarko Basin;

[0051]FIG. 21b shows locations of high quality pressure data;

[0052]FIG. 21c shows an isosurface of 10 bars overpressure;

[0053]FIG. 21d shows an isosurface of 7 bars underpressure;

[0054]FIG. 22 is a tectonic map of the Anadarko Basin showing majorstructures;

[0055]FIG. 23 shows a Basin RTM simulation of Piceance Basinoverpressure, dissolved gas concentration, and gas saturation;

[0056]FIG. 24 lists references to theoretical and experimental relationsbetween log tool response and fluid/rock state;

[0057]FIGS. 25a and 25 b are Basin RTM-simulated sonic log and errorgraphs used to identify basement heat flux;

[0058]FIG. 26 shows a Basin RTM simulation of lignin structural changesat the multi-well experiment site, Piceance Basin;

[0059]FIGS. 27a, 27 b, and 27 c show a zone of high permeability andreservoir risk determined using information theory;

[0060]FIGS. 28a and 28 b show an information theory-predicted highpermeability zone using fluid pressure data and a reservoir simulator aswell as minimal core data;

[0061]FIGS. 29a and 29 b list available Anadarko Basin data;

[0062]FIG. 30 is the Hunton Formation topography automaticallyconstructed from interpreted well data;

[0063]FIG. 31 is a time-lapse crosswell seismic result from Section 36of the Vacuum Field;

[0064]FIG. 32a shows a cross-section of a tortuous path showing varioustransport phenomena;

[0065]FIG. 32b shows a flow-blocking bubble or globule inhibiting theflow of a non-wetting phase;

[0066]FIG. 33 presents preliminary results of a phase geometry dynamicsmodel showing fronts of evolving saturation and wetting;

[0067]FIG. 34 compares two synthetic seismic signals created from BasinRTM-predicted data with two different assumed geothermal gradients;

[0068]FIG. 35 shows the result of using seismic data to determine basinevolution parameters;

[0069]FIGS. 36a, 36 b, and 36 c show that a reservoir reconstructionmodel requires information theory to reduce the features of a reservoirconsistent with that implied by the upscaling in the reservoir simulatorused or the resolution of the available data;

[0070]FIGS. 37a and 37 b illustrate a cross-section of an upper andlower reservoir separated by a seal with a puncture;

[0071]FIG. 38 is a map of the major onshore basins of the contiguousUnited States;

[0072]FIGS. 39a, 39 b, and 39 c are schematic views of cases wherein areservoir is segmented or contains anomalously high permeability(Super-K);

[0073]FIG. 40 is a flow chart showing how a reservoir simulator or acomplex of basin and reservoir simulators is used to integrate,interpret, and analyze a package of seismic, well log, productionhistory, and other data; when information theory is integrated with theoptimal search, the procedure also yields an estimate of uncertainty;

[0074]FIG. 41 portrays a Simulator Complex showing basin and reservoirsimulator relationships;

[0075]FIGS. 42a, 42 b, 42 c, and 42 d show a permeability distributionconstructed by information theory and reservoir simulator technology;

[0076]FIGS. 43a, 43 b, and 43 c show information theory/reservoirsimulator-predicted initial data from transient production history of anumber of wells;

[0077]FIGS. 44a and 44 b are maps of a demonstration site in the PermianBasin in New Mexico;

[0078]FIG. 44a shows waterflood units;

[0079]FIG. 44b is a stratigraphic cross-section;

[0080]FIG. 45 is a graph showing that the probability of variations of awave vector k becomes independent of k as k approaches infinity;

[0081]FIG. 46 is a data flow diagram showing how the Cyber-Cellsimulator uses DNA nucleotide sequence data in a feedback loop;

[0082]FIG. 47 shows some of the cellular features that Cyber-Cellmodels;

[0083]FIGS. 48a and 48 b suggest that Cyber-Cell can handle non-linearphenomena;

[0084]FIG. 48a is a graph of oscillations in Saccharomyces cerevisiaethrough time;

[0085]FIG. 48b shows that nonlinear rate laws allow a cell to transitionfrom a normal state to an abnormal one;

[0086]FIGS. 49a, 49 b, and 49 c show the pathogen Trypanosoma brucei(responsible for sleeping sickness in humans) on which Cyber-Cell hasbeen tested;

[0087]FIG. 49a shows the “long and slender” form of the pathogen;

[0088]FIG. 49b shows the pathogen in its “sturnpy” form;

[0089]FIG. 49c is a graph of predicted concentrations of species withinthe glycosome as a function of time;

[0090]FIG. 50 is a table comparing measured steady state concentrationsand the values predicted by Cyber-Cell as shown in FIG. 49c;

[0091]FIGS. 51a and 51 b illustrate kinetics studies of the T7 family ofDNA-dependent RNA polymerases;

[0092]FIG. 51a graphs Cyber-Cell's predictions;

[0093]FIG. 51b displays measured data;

[0094]FIG. 52 shows Cyber-Cell's simulation of the transcription of theHIV-1 Philadelphia strain;

[0095]FIGS. 53a and 53 b portray the inner workings of the Cyber-Cellsimulator as imbedded in an information theory algorithm;

[0096]FIG. 53a summarizes the data that Cyber-Cell can integrate;

[0097]FIG. 53b shows an exemplary flow chart of theCyber-Cell/information theory process;

[0098]FIG. 54 shows complex polymerization chemical kinetics models usedin the Cyber-Cell simulator;

[0099]FIGS. 55a and 55 b portray the morphology of mesoscopic objects;

[0100]FIG. 55a shows an interior medium surrounded by a boundingsurface;

[0101]FIG. 55b shows the effect of molecular shape on the curvature ofthe bounding surface;

[0102]FIGS. 56a and 56 b are graphs of the effects of noise inexperimental data;

[0103]FIG. 56a graphs the results for 0.3% noise without regularization;

[0104]FIG. 56b graphs the results for 2% and 3% noise withregularization; and

[0105]FIG. 57 is a graph of the uncertainty calculated by the Cyber-Cellsimulator.

DETAILED DESCRIPTION OF THE INVENTION

[0106] Turning to the drawings, the invention is illustrated as beingimplemented in a suitable environment. The following description isbased on embodiments of the invention and should not be taken aslimiting the invention with regard to alternative embodiments that arenot explicitly described herein. A first embodiment, a geologic basinsimulator, is described in Sections I through VIII. Sections IX throughXI describe a second embodiment of the invention, a simulator of livingcells.

I. Technical Overview of Simulation-Enhanced Fracture Detection

[0107] An embodiment of the present invention enhances seismic methodsby using a 3-D reaction, transport, mechanical (RTM) model called BasinRTM. Remote observations provide a constraint on the modeling and, whenthe RTM modeling predictions are consistent with observed values, therichness of the RTM predictions provides detailed data needed toidentify and characterize fracture sweetspots (reservoirs). Thissimulation-enhanced fracture detection (SEFD) scheme is depicted in FIG.40. The Figure indicates the relation between the input “raw” data andthe exploration and production (E&P) output data. Circles indicateprocessing software, and boxes are input and output information. TheSEFD module compares the predicted and observed values of seismic,geological, and other parameters and terminates the iteration when thedifference (E) is below an acceptable lower limit (E_(c)). SEFD makesthe integration of remote measurement and other observations withmodeling both efficient and “seamless.”

[0108] The SEFD algorithm has options for using raw or interpretedseismic data. The output of a 3-D basin simulator, Basin RTM, lithologicinformation, and other data are used as input to a synthetic seismicprogram. The latter's predicted seismic signal, when compared with theraw data, is used as the error measure E as shown in FIG. 40. Similarly,well logs and other raw or interpreted data shown in FIG. 1 can be used.The error is minimized by varying the least well constrained basinparameters. This error minimization scheme is embedded in informationtheory approaches to derive estimates of uncertainty. The basinsimulation scheme of FIG. 40 can be integrated with, or replaced by, oneinvolving a reservoir simulator as suggested in FIGS. 40 and 41.

[0109] The SEFD method integrates seismic data with other E&P data(e.g., well logs, geochemical analysis, core characterization,structural studies, and thermal data). Integration of the data isattained using the laws of physics and chemistry underlying the basinmodel used in the SEFD procedure:

[0110] conservation of momentum for fluid and solid phases;

[0111] conservation of mass for fluid and solid phases; and

[0112] conservation of energy.

[0113] (See FIG. 3.) These laws facilitate extrapolation away from thesurface and wellbore and are made consistent with seismic data to arriveat the SEFD approach shown in FIGS. 1, 40, and 41.

[0114] The Basin RTM model is calibrated by comparing its predictionswith observed data from chosen sites. Calibration sites meet thesecriteria: richness of the data set and diversity of tectonic setting andlithologies (mineralogy, grain size, matrix porosity). FIG. 2 listsseveral sites for which extensive data sets have been gathered. Datainclude the complete suite of formation depths, age, and lithologiccharacter as well as analysis of thermal, tectonic, and sea levelhistory.

[0115] Basin RTM attains seismic invertibility by its use of many keyfracture prediction features not found in other basin models:

[0116] nonlinear poroelasticity/viscosity rheology integrated withpressure solution, fracture strain rates, and yield behavior forfaulting;

[0117] a full 3-D fracture network statistical dynamics model;

[0118] rheologic and multi-phase parameters that coevolve withdiagenesis, compaction, and fracturing;

[0119] new multi-phase flow and kerogen reactions producing petroleumand affecting overpressure;

[0120] tensorial permeability from preferred fracture orientation andconsequent directed flows;

[0121] inorganic fluid and mineral reactions and organic reactions; and

[0122] heat transfer.

[0123] (See FIG. 3.) While previous models have some of these processes,none have all, and none are implemented using full 3-D, finite-elementmethods. Basin RTM preserves most couplings between the processes shownin FIG. 3. The coupling of these processes in nature implies that tomodel any one of them requires simulating all of them simultaneously. Asfracturing couples to many RTM processes, previous models with only afew such factors cannot yield reliable fracture predictions. Incontrast, the predictive power of Basin RTM, illustrated in FIGS. 4through 12, 14, 16 through 18, 23, and 33, and discussed further below,surmounts these limitations.

[0124] Commonly observed “paradoxes” include fractures without flexureand flexure without fractures. These paradoxes illustrate the inadequacyof previous fracture detection techniques based on statisticalcorrelations. For example, previous models base porosity history on aformula relating porosity to mineralogy and depth of burial. However,porosity evolves due to the detailed stress, fluid composition andpressure, and thermal histories of a given volume element of rock. Thesehistories are different for every basin. Thus, in the real world, thereis no simple correlation of porosity with depth and lithologic type. Asshown in FIG. 3, aspects of geological systems involve a multiplicity offactors controlling their evolution. Some processes arememory-preserving and some are memory-destroying. Therefore, there areno simple correlations among today's state variables. The detailedhistory of processes that operated millions of years ago determinestoday's fracture systems. Basin RTM avoids these problems by solving thefully coupled rock deformation, fluid and mineral reactions, fluidtransport, and temperature problems (FIGS. 3 and 13). Basin RTM derivesits predictive power from its basis in the physical and chemical lawsthat govern the behavior of geological materials.

II. Details of an Exemplary Embodiment of the Geologic Basin Simulator

[0125] The variables predicted by the Basin RTM simulator throughout thespace and during the time of a basin simulation include:

[0126] pressure, composition, and saturation of each pore fluid phase;

[0127] temperature and stress;

[0128] size, shape, and packing of the grains of all minerals;

[0129] fracture network (orientation, aperture, length, andconnectivity) statistics; and

[0130] porosity, permeability, relative permeabilities, and capillarypressures.

[0131] This data can be used directly or through transformation (e.g.,synthetic seismic signals, well logs) to provide a measure of agreementswith observations as needed for information theory integration of dataand modeling. To make these predictions, however, the Basin RTMsimulator needs information on phenomenological parameters and basinhistory parameters (sedimentary, basement heat flux, overall tectonic,and other histories) which themselves are often poorly constrained.

[0132] The basin model:

[0133] includes formulas relating fluid/rock state to well logging toolresponse;

[0134] includes a chemical kinetic model for type-II kerogen and oilcracking that simulates deep gas generation, models the relation betweenvitrinite reflectance and the kerogen composition, and integrates theabove with the 3-D multi-phase, miscible fluid flow model;

[0135] implements the measured data/Basin RTM integration technology asshown in FIG. 1; and

[0136] expands and formats a basin database for use as in FIG. 1 anduses graphics modules to probe the data.

[0137] A complex network of geochemical reactions, fluid and energytransport, and rock mechanical processes underlies the genesis,dynamics, and characteristics of petroleum reservoirs in Basin RTM(FIGS. 3 and 13). Because prediction of reservoir location andproducibility lies beyond the capabilities of simple approaches as notedabove, Basin RTM integrates relevant geological factors and RTMprocesses (FIG. 13) in order to predict fracture location andcharacteristics. As reservoirs are fundamentally 3-D in nature, BasinRTM is fully 3-D.

[0138] The RTM processes and geological factors used by Basin RTM aredescribed in FIGS. 3 and 13. External influences such as sediment input,sea level, temperature, and tectonic effects influence the internal RTMprocesses. Within the basin, these processes modify the sedimentchemically and mechanically to arrive at petroleum reserves, basincompartments, and other internal features.

[0139] Basin RTM predicts reservoir producibility by estimating fracturenetwork characteristics and effects on permeability due to diageneticreactions or gouge. These considerations are made in a self-consistentway through a set of multi-phase, organic and inorganic,reaction-transport and mechanics modules. Calculations of these effectspreserve cross-couplings between processes (FIGS. 3 and 13). Forexample, temperature is affected by transport, which is affected by thechanges of porosity that changes due to temperature-dependent reactionrates. Basin RTM accounts for the coupling relations among the full setof RTM processes shown in FIG. 3.

[0140] Key elements of the dynamic petroleum system include a full suiteof deformation mechanisms. These processes are strongly affected bybasin stress history. Thus, good estimates of the evolution of stressdistributions are necessary in predicting these reservoircharacteristics. As fracturing occurs when fluid pressure exceeds leastcompressive stress by tensile rock strength, estimates of the time offracture creation, growth, healing or closure, and orientation rely onestimates of the stress tensor distribution and its history. Simpleestimates of least compressive stress are not sufficient for accuratepredictions of fracturing. For example, least compressive stress canvary greatly between adjacent lithologies-a notable example beingsandstones versus shales. (See FIGS. 3, 5, 7 through 12, and 14). InBasin RTM, stress evolution is tightly coupled to other effects.Fracture permeability can affect fluid pressure through the escape offluids from overpressured zones, in turn, fluid pressure stronglyaffects stress in porous media. For these reasons, the estimation of thedistribution and history of stress should be carried out within a basinmodel that accounts for the coupling among deformation and otherprocesses as shown in FIG. 3.

[0141] A rock Theological model based on incremental stress theory isincorporated into Basin RTM. This formalism has been extended to includefracture and pressure solution strain rates with elastic and nonlinearviscous/plastic mechanical rock response. This rheology, combined withforce balance conditions, yields the evolution of basin deformation. TheBasin RTM stress solver employs a moving, finite-element discretizationand efficient, parallelized solvers. The incremental stress rheologyused is {dot over (ε)}={dot over (ε)} ^(el)+{dot over (ε)} ^(in)+{dotover (ε)} ^(ps)+{dot over (ε)} ^(fr). Here {dot over (ε)} is the netrate of strain while the terms on the right hand side give the specificdependence of the contributions from poroelasticity (el), continuousinelastic mechanical (in), pressure solution (ps), and fracturing (fr).The boundary conditions implemented in the Basin RTM stress module allowfor a prescribed tectonic history at the bottom and sides of the basin.

[0142] The interplay of overpressuring, methanogenesis, mechanicalcompaction, and fracturing is illustrated in FIG. 4a. In FIG. 4b, asource rock in the Ellenburger Formation of the Permian Basin (WestTexas) is seen to undergo cyclic oil expulsion associated withfracturing.

[0143] In FIGS. 9a and 9 b, the results of Basin RTM showfault-generated fractures and their relation to the creation offracture-mediated compartments and flow. In FIG. 9a, the shadingindicates porosity and shows differences between the four lithologies;the shales (low porosity) are at the middle and top of the domain.Higher porosity regions (in the lower-right and upper-left comers) andthe fracture length (contour lines) arose due to the deformation createdby differential subsidence. The arrows indicate fluid flow toward theregion of increasing porosity (lower-right) and through the mostextensively fractured shale. FIG. 9b shows the predicted direction andmagnitude of fluid flow velocity. This system shows the interplay ofstress, fracturing, and hydrology with overall tectonism-features whichgive Basin RTM its power.

[0144] A key to reservoirs is the statistics of the fracture network.Basin RTM incorporates a unique model of the probability for fracturelength, aperture, and orientation. The model predicts the evolution intime of this probability in response to the changing stress, fluidpressure, and rock properties as the basin changes. (See FIGS. 7 and14). The fracture probability formulation then is used to compute theanisotropic permeability tensor. The latter affects the direction ofpetroleum migration, information key to finding new resources. It alsois central to planning infill drilling spacing, likely directions forfield extension, the design of horizontal wells, and the optimum rate ofproduction.

[0145]FIG. 14 shows a Basin RTM simulation for the Andector Field(Permian Basin, West Texas).

[0146] The fracture network is dynamic and strongly lithologicallycontrolled. FIG. 7 shows predicted fracture orientations and lengths formacrovolume elements in shale (top) and sandstone (bottom) at four timesover the history of the Piceance Basin study area. Changing sedimentproperties, stress, and fluid pressure during the evolution of the basinresult in the dynamic fracture patterns. Understanding such occurrencesof the past, therefore, can be important for identifying orunderstanding reservoirs in presently unlikely structural andstratigraphic locations. The fractures in a shale are more directionaland shorter-lived; those in the sandstone appear in all orientationswith almost equal length and persist over longer periods of geologicaltime.

[0147] The 3-D character of the fractures in this system is illustratedin FIGS. 5, 8a, and 8 b. In FIG. 8a, the folded, multi-layered structureis dictated by the interplay of lithological differences and fracturingand shows the 3-D complexity of the connectivity of over-pressuredzones. Thus, using a simple pressure-depth curve to model stackedover-pressured compartments may yield little insight into the fullthree-dimensionality of the structure.

[0148] Modules in Basin RTM compute the effects of a given class ofprocesses (FIGS. 3 and 13). The sedimentation/erosion history recreationmodule takes data at user-selected well sites for the age andpresent-day depth, thickness, and lithology and creates the history ofsedimentation or erosion rate and texture (grain size, shape, andmineralogy) over the basin history. The multi-phase and kerogendecomposition modules add the important component of petroleumgeneration, expulsion, and migration (FIGS. 6a, 6 b, 6 c, 10, and 11).Pressure solution modules calculate grain growth/dissolution at freefaces and grain-grain contacts. The evolution of temperature isdetermined from the energy balance. Physico-chemical modules are basedon full 3-D, finite-element implementation. As with thestress/deformation module, each Basin RTM process and geological dataanalysis module is fully coupled to the other modules (FIGS. 3 and 13).

[0149] The continuous aspects of the Basin RTM rheology for chalk andshale lithologies are calibrated using published rock mechanical dataand well studied cases wherein the rate of overall flexure orcompression/extension have been documented along with rock texture andmineralogy. Basin RTM incorporates calibrated formulas for theirreversible, continuous, and poroelastic strain rate parameters andfailure criteria for chalk and shale needed for incremental stressrheology and the prediction of the stresses needed for fracture andfault prediction.

[0150] The texture model incorporates a relationship between rockcompetency and grain-grain contact area and integrates the rockcompetency model with the Markov gouge model and the fracture networkstatistics model to arrive at a complete predictive model of faulting.

[0151] Basin RTM's 3-D grid adaptation scheme (1) is adaptive so thatcontacts between lithologic units or zones of extreme textural changeare captured; and (2) preserves all lithologic contacts.

[0152] In the information theory approach of FIGS. 1, 40, and 41, BasinRTM is optimized whereby parameters that are key to the predictions, yetare less well known, are computed by (1) generating a least-square orother error (that represents the difference between the actual data andthat predicted by Basin RTM and seismic recreation programs), and (2)minimizing the error and also imposing physical constraints on the timeand length scales on which tectonic and other parameters can change.

[0153] A chemical kinetic model of natural gas generation from coal isused to model the deep gas generation. The new kinetic model for gasgeneration is based on the structure of lignin, the predominantprecursor molecule of coal. Structural transformations of ligninobserved in naturally matured samples are used to create a network ofeleven reactions involving twenty-six species. The kinetic modelrepresenting this reaction network uses multi-phase reaction-transportequations with n^(th) order processes and rate laws. For the immobilespecies, i.e., those bound with the kerogen, the rate equations take theform $\begin{matrix}{\frac{D\quad C_{i}}{Dt} = {\sum\limits_{\alpha}{v_{{i\alpha}\quad}k_{\alpha}^{eff}{\prod\limits_{i^{\prime},{v_{i^{\prime}\alpha} > 0}}\quad C_{i^{\prime}}^{v_{i\alpha}}}}}} & (1)\end{matrix}$

[0154] where C_(i) is moles of immobile kerogen species i per kerogenvolume and k_(α) ^(eff) is an effective rate coefficient for reaction αthat consumes one or more reactant molecules (ν_(αi)<0) and generatesproduct molecules (ν_(ia)>0). The model assumes that the kerogenreactions are irreversible. (See FIG. 26.)

[0155] To predict petroleum composition and to take full advantage ofthe vitrinite and fluid inclusion data, the model uses a chemicalkinetic model of kerogen and petroleum reaction kinetics. It includesover twenty species in a model of kerogen or oil to thermal breakdownproducts based on a chemical speciation/bond breaking approach similarto that developed for lignin kinetics. The model uses a hydrocarbonmolecular structure/dynamics code to guide the macroscopic kineticmodeling.

[0156] The model also incorporates a risk assessment approach based oninformation theory. The method differs from others in geostatistics inthat it integrates with basin simulation as follows. Information theoryprovides a method to objectively estimate the probability ρ of a givenset A (=A₁, A₂, . . . , A_(N)) of N parameters which are the mostuncertain in the analysis. For the present example, these includebasement heat flux, overall tectonics, sedimentation/erosion history,etc. The entropy S is then introduced via S=−∫d^(N) Aρlnρ which is foundto be an objective measure of uncertainty. The information theoryapproach is then to maximize S constrained by the information known, theresult being an expression for the A-dependence of ρ. An example ofprobability function p for the radius of the enhanced permeability zonein FIG. 27a is shown in FIG. 27c. Note that as the tolerable error isdecreased, the function approaches the Dirac delta function located atr=1000 meters which is the actual radius of the enhanced permeabilityzone. With such an approach, the model computes the expected locationand state of a reservoir and provides quantitative measures of theuncertainties in this prediction.

[0157] In this approach, the results of a Basin RTM simulation or of areservoir simulation yields a set of M predicted variables Ω(=Ω₁, Ω₂, .. . , Ω_(M)). These include porosity, permeability, and mineralogy,geochemical and thermal data, and fracture statistics from which themodel calculates synthetic seismic well log and geochemical data. Thesepredictions depend on A via the Basin RTM or reservoir simulator.Setting the average of the Ω to observed values O₁, O₂, . . . , O_(M) ofthese quantities yields constraints on p. Then maximizing S subject tothese constraints (observations) yields ρ(A). With ρ(A), the modelprovides not only a prediction of the most likely values of the N As,but also of the variance in the As. Thereby, the model computes thevariance in predicted reservoir characteristics. Through the integrationof this approach with data/modeling technology, the model provides therisk analysis the industry needs to assess the economics of a givenstudy area.

[0158] The key is that the relation Ω_(i)(A) can only be obtainedthrough simulations. To surmount the need for using an exceedingly greatamount of computer time for each simulation, the model carries outselective simulations and then fits the Ω_(i)(A) to an analytic functionby least square or other fitting. Next, the model finds the value of Aminimizing the error and then refines the computation in the vicinity ofthe first approximate value minimizing the error.

[0159] Risk assessment is a key aspect of the data/modeling integrationstrategy There are uncertainties in the geological data needed for inputto Basin RTM (notably overall tectonic, sedimentary, and basement heator mass flux). This leads to uncertainties in data/modeling integrationpredictions. The model addresses this key issue with a novel informationtheory approach that automatically embeds risk assessment intodata/modeling integration as an additional outerlooping in the flowchart of FIG. 1.

[0160] Geostatistical methods are extensively used to construct thestate of a reservoir. Traditional geostatistical methods utilize staticdata from core characterizations, well logs, seismic, or similar typesof information. However, because the relation between production andmonitoring well data (and other type of dynamic data) and reservoirstate variables is quite complicated, traditional geostatisticalapproaches fail to integrate dynamic and static data. Two significantmethods have been developed to integrate the dynamic flow of informationfrom production and monitoring wells and the static data. The goal ofboth methods is to minimize an “objective function” that is constructedto be a measure of the error between observations and predictions. Themultiple data sets are taken into consideration by introducing weightingfactors for each data set. The first method (sequentialself-calibration) defines a number of master points (which is less thanthe number of grid points on which the state of the reservoir is to becomputed). Then a reservoir simulation is performed for an initial guessof the reservoir state variables that is obtained by the use oftraditional geostatistical methods. The nonlinear equations resultingfrom the minimization of the objective function requires the calculationof derivatives (sensitivity coefficients) with respect to the reservoirstate variables. The approximate derivatives are efficiently obtained byassuming that stream lines do not change because of the assumed smallperturbations in the reservoir state variables. In summary, thesequential self-calibration method first upscales the reservoir using amultiple grid-type method and then uses stream line simulators toefficiently calculate the sensitivity coefficients. A difficulty in thisprocedure is that convergence to an acceptable answer is typically notmonatomic (and is thereby slow and convergence is difficult to assess).The second method (gradual deformation) expresses the reservoir state asa weighted linear sum of the reservoir state at the previous iterationand two new independent states. The three weighting factors aredetermined by minimizing the objective function. The procedure isiterated using a Monte Carlo approach to generate new states. The greatadvance of the present approach over these methods is that (1) itdirectly solves a functional differential equation for the most probablereservoir state and (2) has a greatly accelerated numerical approachthat makes realistic computations feasible.

[0161] To use well logs in the data/modeling scheme of FIG. 1, the modelgeneralizes formulas from the literature (FIG. 24) relating log toolresponse to fluid/rock state. A synthetic sonic log for the PiceanceBasin of Colorado is shown in FIG. 25a. This log was computed usingBasin RTM-predictions of the size, shape, and packing of the grains ofall minerals, porosity, pore fluid composition, and phase (state ofwetting), and fracture network statistics. The variation in the p-wavevelocity is a combined result of density variation and mineralcomposition, as well as fracture network properties.

[0162] III. Geologic Data Types and Availability

[0163] Geological input data are divided into four categories (FIG. 13).The tectonic data gives the change in the lateral extent and the shapeof the basement-sediment interface during a computational advancementtime δt. Input includes the direction and magnitude ofextension/compression and how these parameters change through time.These data provide the conditions at the basin boundaries needed tocalculate the change in the spatial distribution of stress and rockdeformation within the basin. This calculation is carried out in thestress module of Basin RTM.

[0164] The next category of geological input data directly affects fluidtransport, pressure, and composition. This includes sea level, basinrecharge conditions, and the composition of fluids injected from theocean, meteoric, and basement sources. Input includes the chemicalcomposition of depositional fluids (e.g., sea, river, and lake water).This history of boundary input data is used by the hydrologic andchemical modules to calculate the evolution of the spatial distributionof fluid pressure, composition, and phases within the basin. Thesecalculations are based on single- or multi-phase flow in a porous mediumand on fluid phase molecular species conservation of mass. Thephysico-chemical equations draw on internal data banks forpermeability-rock texture relations, relative permeability formulae,chemical reaction rate laws, and reaction and phase equilibriumthermodynamics.

[0165] The spatial distribution of heat flux imposed at the bottom ofthe basin is another input to Basin RTM. This includes either basin heatflow data or thermal gradient data that specify the historicaltemperature at certain depths. This and climate/ocean bottom temperaturedata are used to evolve the spatial distribution of temperature withinthe basin using the equations of energy conservation and formulas anddata on mineral thermal properties.

[0166] Lithologic input includes a list and the relative percentages ofminerals, median grain size, and content of organic matter for eachformation. Sedimentation rates are computed from the geologic ages ofthe formation tops and decomposition relations.

[0167] The above-described geological input data and physico-chemicalcalculations are integrated in Basin RTM over many time steps δt toarrive at a prediction of the history and present-day internal state ofthe basin or field. Basin RTM's output is rich in key parameters neededfor choosing an E&P strategy: the statistics of fracture length,orientation, aperture, and connectivity, in situ stress, temperature,the pressure and composition of aqueous and petroleum phases, and thegrain sizes, porosity, mineralogy, and other matrix textural variables.

[0168] For many basins worldwide, the petroleum industry has largestores of data. A large portion of these data, often acquired at greatexpense, has not been adequately used. The basin model provides arevolutionary approach that automatically synthesizes these data for E&Panalysis, notably the special challenges of deep petroleum andcompartmented or fractured regimes. The typical information availableincludes seismic, well log, fluid inclusion, pore fluid composition andpressure, temperature, vitrinite reflectance, and corecharacterizations. (See FIGS. 1, 2, 15, 19 through 21, 29 b, and 31).Examples of data and locations in U.S. basins are seen in FIGS. 15through 21.

[0169] The use of these data presents several challenges:

[0170] the need to extrapolate away from the well or down from thesurface;

[0171] omnipresent noise or other measurement error;

[0172] the time-consuming nature of the manual interpretation of thisdata; and

[0173] the lack of an unambiguous prediction of reservoir location andcharacteristics from these data.

[0174] In the latter context, well logs or seismic data, for example,cannot be used to unambiguously specify the local fluid/rock state(shape, packing and mineralogy, grain size, porosity, pore fluidcomposition, and fracture network statistics). In the present approach,the uniqueness of the fluid/rock state to seismic/well log responserelationship is exploited (similarly for the geochemical data). Thisavoids the ambiguity in the inverse relationship, seismic/well log datato fluid/rock state, on which log or seismic interpretation is based inother approaches.

[0175] The pathway to achieving this goal is via comprehensive basinmodeling and information theory. The basin model is a three-dimensionalmodel that uses finite-element simulations to solve equations of fluidand mineral reactions, mass and energy transport, and rock mechanics topredict the fluid/rock state variables needed to compute seismic, welllog, and other data. The difference between the basin model-predictedwell log and geochemical data and the actual observed data provides amethod for optimizing both the interpretation of the data and therichness of the reservoir location and characteristics predicted by the3-D model, Basin RTM. (See FIGS. 1, 40, and 41.) Information theoryprovides a methodology whereby these data and the modeling can be usedto estimate uncertainty/risk in predictions.

[0176] The model focuses on well logs, seismic data, fluid pressure,vitrinite reflectance, and fluid inclusions. It includes formulas thatyield the synthetic data from the rock/fluid state as predicted by theBasin RTM output variables. The Basin RTM organic kinetics modelpredicts the many chemical species quantified in the pore fluidcomposition, fluid inclusion, and vitrinite reflectance data.

[0177]FIGS. 29a and 29 b summarize the Anadarko Basin data presentlyavailable. Over 25 lithologies have been dated and described texturallyand mineralogically. These data are complemented with additionalseismic, well log, and other data.

[0178] The tools used to browse the database include isosurfaces,cross-sections, and probes along any line. They are in the form offluid/rock state variables as a function of depth or as synthetic logsfor easy comparison with additional data available to the user. The 1-Dprobe can be placed anywhere in the basin to yield any of a hundredfluid/rock state variables as a function of depth, as suggested in FIG.30.

[0179] Relations between well log response and fluid/rock state havebeen set forth for a number of logging tools. A brief summary oftheoretical formulas or experimental correlations and references isgiven in FIG. 24. The published and new fluid/rock state to log toolresponse relations are recast in terms of the specific fluid/rockvariables predicted by Basin RTM.

IV. Salt Tectonic Petroleum Regimes

[0180] As salt withdrawal is an important factor in fracturing in somebasins, Basin RTM models salt tectonics. (See FIGS. 10 through 12.)Basin RTM addresses the following E&P challenges:

[0181] predict the location and geometry of zones of fracturing createdby salt motion;

[0182] predict the morphology of sedimentary bodies created by saltdeformation;

[0183] locate pools of petroleum or migration pathways created by salttectonics; and

[0184] assist in the interpretation of seismic data in salt tectonicregimes.

[0185] The interplay of salt deformation with the rheology of thesurrounding strata is key to understanding the correlation between saltdeformation and reservoir location. FIGS. 10 through 12 show simulationresults produced by Basin RTM. In FIG. 10, source rock overlying thedome was transiently overpressured and fractured, facilitating upwardoil migration within it and into the overlying layers. Orientations oflong-lived fractures (residing in the sandstones) illustrate therelationship between the salt motion and fracture pattern. FIG. 11 issimilar to FIG. 10 except for an initially finite size (lenticular) saltbody. FIG. 11 also adds the co-evolution of subsalt petroleum. It showsthe oil saturation with curves indicating lithologic contacts. Theoverpressure under the salt body and the stress regime on the underlyingsediment have preserved porosity in the center region under the saltwhile the compaction under the edge of the salt led to the formation ofa seal. In the quarter section of a salt diaper simulated in FIG. 12,the relationship to fracturing in the overlying sandstones after 3million years of deformation is shown. It is the integration of thesetypes, of simulations with a suite of geological data throughinformation theory that gives them a greatly enhanced potential forpredicting reservoir location and characteristics and associated risksand uncertainties.

V. Compartmental Petroleum Regimes

[0186] A sedimentary basin is typically divided into a mosaic ofcompartments whose internal fluid pressures can be over (OP) or under(UP) hydrostatic pressure. An example is the Anadarko Basin as seen inFIGS. 21a, 21 b, 21 c, 21 d, and 22. Compartments are common featuresworldwide. Compartments are defined as crustal zones isolated in threedimensions by a surrounding seal (rock of extremely low permeability).Identifying them in the subsurface is key to locating by-passedpetroleum in mature fields. Extensive interest in these phenomena hasbeen generated because of their role as petroleum reservoirs.

[0187] Compartmentation can occur below a certain depth due to theinterplay of a number of geological processes (subsidence,sedimentation, and basement heat flux) and physico-chemical processes(diagenesis, compaction, fracturing, petroleum generation, andmulti-phase flow). These compartments exist as abnormally pressured rockvolumes that exhibit distinctly different pressure regimes in comparisonwith their immediate surroundings, thus they are most easily recognizedon pressure-depth profiles by their departure from the normalhydrostatic gradient. The integration of basin modeling and data throughinformation theory allows one to more accurately predict the locationand characteristics of these compartments

[0188] Integrated pore-pressure and subsurface geological data indicatethe presence of a basinwide, overpressured compartment in the AnadarkoBasin. This megacompartment complex (MCC) is hierarchical, i.e.,compartments on one spatial scale can be enclosed by compartments onlarge spatial scales. (See FIG. 21a.) The Anadarko MCC encompasses theMississippian and Pennsylvanian systems, and it remained isolatedthrough a considerably long period of geological time (early Missourianto present). Compartments within the MCC are isolated from each other bya complex array of seals. Seal rocks often display unique diageneticbanding structures that formed as a result of the mechano-chemicalprocesses of compaction, dissolution, and precipitation.

[0189] Data from the Piceance Basin have been used with Basin RTM toevaluate the fluid pressure history of the coastal interval sandstone(Upper Cretaceous Mesaverde Group in the Piceance Basin, northwestColorado) with gas saturation (pore volume occupied by gas phasegenerated from underlying source rocks) (FIG. 24). Starting at about 52Ma, after incipient maturation of the underlying source rock (thepaludal interval coal), gas is initially transported into the sandstonedissolved in pore fluids. Aqueous methane concentration increases asmore gas is generated from maturing source rocks and as pore fluidmigrates upward into the sandstone from compacting and overpressuringsource rocks below. Aqueous methane concentration continues to increaseuntil its peak at about 25 Ma. At this time, aqueous methaneconcentration begins to decrease and the free gas phase forms. The gasphase is exsolving from the aqueous phase because uplift and erosion aredecreasing the confining stresses and decreasing the solubility of thegas in the aqueous phase. Aqueous methane continues to decline for theremainder of the simulation, and gas saturation is maintained at about20%.

[0190] Deep gas and by-passed petroleum in compartmented reservoirs(e.g., the Anadarko Basin) likely constitute the most promising naturalgas resources for the United States as recent discoveries indicate. Themodel's current focus on such regimes addresses a number of criticalresearch needs as these systems are still poorly understood from boththe exploration and production standpoints. As the novel data/basinmodeling interpretation greatly improves the ability to predict thelocation and characteristics of these reservoirs, the results assist inboth improving energy independence and the efficiency with which theseregimes are explored.

VI. Petroleum Reservoirs, CO₂ and Waste Sequestration, and PollutantMigration

[0191] Several aspects of the oil industry may be addressed by thepresent invention: (a) time-lapse production of oil fields for improvedperformance; (b) monitoring of enhanced oil-production using injectedfluids such as CO₂; (c) reduced greenhouse gas emissions at localizedwell sites; and (d) reduction in greenhouse gases produced bywide-spread use of petroleum.

[0192] The objective of time-lapse production of oil fields is toproduce the most oil from a reservoir over its lifetime using the fewestnumber of wells. Monitoring techniques such as time-lapse 3-D surfaceseismic and high-resolution crosswell seismology are good indicators ofthe current state of the reservoir. But these data along with productioninformation need to be incorporated into a physico-chemical modelingapproach that will enable reservoir predictions and the impliedstrategies. Only with the advent of time-lapse monitoring of a reservoirin recent years has this synergy with modeling become feasible.

[0193] Enhanced oil recovery by injecting fluids into a reservoir can bea costly prospect resulting in millions of spent dollars. It isimportant to know where the injected fluid and petroleum migrate tooptimize the location of injection and producing wells. Recovery andreuse of the injected fluids and depth are important cost reductionissues.

[0194] The technology minimizes losses due to by-passed reserves,formation damage, drilling costs, and excessive water (vs. petroleum)production. Such problems arise in both high and low matrix permeabilitysystems and commonly occur in cases where reservoirs are compartmentedor contain zones of super-K (i.e., regions of karst or wide-aperture,connected fractures-leading to anomalously high local permeability). Anapproach to such systems should be based on a quantifiedcharacterization of the reservoir away from the wellbore and down fromthe surface. The present approach incorporates the following:

[0195] production history, well log, seismic, and other data;

[0196] estimation of uncertainties and risk in next well citing andproduction strategy; and

[0197] available basin and reservoir simulators.

[0198] FDM integrates all the above in one automated procedure thatyields a continuously updated forecast and strategy for the futuredevelopment and production of a field. It achieves this through softwarethat integrates reservoir simulation, data, and information theory.

[0199] In the cases shown in FIGS. 39a, 39 b, and 39 c, there aredifficulties in placing wells and planning the best production ratesfrom existing wells to minimize by-passed reserves and excessive watercuts. In FIG. 39a, the upper and lower reservoirs are separated by aseal in a poorly defined region. In FIG. 39b, pinchout separates asandstone reservoir into two poorly connected regimes. In FIG. 39c, azone of super-K can direct flows around petroleum-saturated matrix andthus lead to by-passing of reserves. The key to making successfuldecisions is quantifying the geometry of reservoir connectivity orcompartmentation. The present approach places quantitative limits on thelocation, shape, and extent of the zones of super-K or connectivity toother reservoirs or parts of the same, multi-lobed reservoir.

[0200] The present approach allows for the following:

[0201] A new multi-phase flow law that accounts for the changing wettingand intra-pore geometry (and associated hysteresis) of the fluid phases.This overcomes the weaknesses of other multi-phase models. The flow lawsand related reservoir simulator describe CO₂ injection and simultaneousenhanced petroleum recovery with sufficient pore scale detail tocalculate the seismic velocity and attenuation needed to interprettomographic images.

[0202] Advanced formulas for the dependence of seismic wave speed andattenuation (as predicted by the new multi-phase flow model) on fluidphase geometry, fractures, and grain size, shape, mineralogy, andpacking to achieve enhanced seismic image interpretation. Thesedependencies are not accounted for in a self-consistent and simultaneousmanner in other seismic image interpretation approaches.

[0203] By integrating the seismic wave velocity and attenuation formulaswith the multi-process reservoir simulator, an automated approach isobtained that is a qualitative improvement in both the interpretation ofcrosswell tomographic images of the CO₂ plume and other evolvingrepository features and that improves the accuracy of reservoirsimulation. The reservoir model can predict sufficient information tocompute the seismic wave velocities and attentions and, thereby, achievethis integration.

[0204] The information theory-based approach for estimating the mostprobable reservoir state and associated risk allows for the automationof the delineation of reservoir size, shape, CO₂ plume characteristics,internal distribution of porosity, and multiphase flow properties, aswell as integration of reservoir simulation and crosswell tomographicimage interpretation.

[0205] A novel numerical algorithm for solving the inverse problem is amajor improvement over simulated annealing and other procedures. Thetechnique captures the 3-D complexity of a repository.

[0206] The availability of accurate predictive models and of techniquesfor monitoring the time-course of an injected waste plume are key to theevaluation of a strategy for CO₂ and other fluid waste disposal ingeological repositories. The present method addresses both of theserequirements using novel modeling and modem seismic imaging methods andintegrates them via information theory for predicting and monitoring thetime course for original and injected fluids. The technology can be usedto optimize the injection process or to assess the economic viability ofthis disposal approach. The method combines new physical and chemicalmulti-phase modeling techniques, computational methods, informationtheory, and seismic data analysis to achieve a completely automatedmethod. As such, the method is of great fundamental interest indelineating the dynamics of the subsurface and of great practical valuein a variety of waste disposal and resource recovery applications.

[0207] Substantial potential exists for environmentally soundsequestration of CO₂ or other waste fluids in geological formations withhigh matrix or vuggy porosity/permeability. These include depleted orproducing oil and gas reservoirs and brine-filled formations. Thewidespread geographical distribution of such sites, and the possibilityfor simultaneous CO₂ sequestration and enhanced petroleum recovery, makethis technology of great potential value.

[0208] Geological sequestration of CO₂ requires that the CO₂ betransported into the formation, displacing gas or liquid initiallypresent, and trapping CO₂ in the formation for stable, long-termstorage. A critical component of a storage strategy is to understand themigration and trapping characteristics of CO₂ and the displaced fluids.This is a multi-phase, porous medium, reaction-transport system.Modeling CO₂ migration and trapping requires a quantitative descriptionof the associated reaction, transport, and mechanical processes from thepore to the field scale. The challenge is made even greater as much ofthe state of porosity, permeability, and other reservoir characteristicsare only known statistically, implying the need for a reliable riskassessment approach.

[0209] Crosswell tomography can delineate an image of the CO₂ plume. InFIG. 31, the two darkest gray values represent the largest velocitydecrease due to CO₂ of about 1.5 to 2%. The velocity difference becomessmaller for consecutive gray levels from the two darkest gray valueswhile white indicates no velocity difference. However, seismic wavespeed and attenuation depend on many reservoir factors that can changeduring injection (porosity, pore fluid phase and configuration, grainsize, shape, mineralogy, and packing and fracture network statistics).Thus an unambiguous delineation of the CO₂ plume, and not other changingreservoir characteristics induced by injection, requires additionalinformation. The present method solves this noninvertability problem byintegrating multiple process reservoir simulators with crosswelltomographic image interpretation.

[0210] To address these challenges to monitoring and optimizing thegeological sequestration of CO₂, the present method:

[0211] (1) implements a new multi-phase flow law to account for theevolving pore-scale geometry and wetting of the fluid phases (toovercome the shortcomings of available reservoir simulators);

[0212] (2) uses improved seismic velocity/attenuation formulas andimplements them into an automated seismic image interpretationalgorithm;

[0213] (3) uses an information theory method to predict the mostprobable state and associated uncertainties in the distribution ofreservoir characteristics;

[0214] (4) integrates the above three with crosswell tomographic imagingof the CO₂ plume; and

[0215] (5) is tested in a well studied Vacuum Field.

[0216] The subsurface is only partially characterized through well log,seismic, surface, and production histories. What is needed is anobjective formulation for integrating all these data into a statisticalframework whereby uncertainties in the spatial distribution of fluids,hydrologic properties, and other factors can be estimated and therelated uncertainties evaluated. The present method uses a rigorousinformation theory approach to assess this uncertainty. It obtains theprobability for the least well constrained pre-CO₂-injection state ofthe repository. This allows it to both predict the likely consequence ofthe injection and to quantify the related risks.

[0217] Data on CO₂ injection are gathered to test the integrated seismicimaging and reservoir simulation technologies. Data include well logs,downhole sampling, core analysis, seismic data, and productioninformation. Formulas for the dependence of seismic velocity andattenuation on local reservoir factors are incorporated into the seismicinterpretation algorithm. Factors accounted for include fluid phasegeometry and wetting, rock texture, and fracturelength/aperture/orientation statistics. The multi-phase flow model andreservoir RTM simulator uniquely provide the level of detail on thesefactors required for reliable seismic image interpretation of both theCO₂ plume and its effects on the repository lithologies and surroundingseals. The seismic formulas, artificial seismic image recreation, andinformation theory are integrated to yield enhanced interpretation ofseismic images (the simulation-enhanced remote geophysics (SERG)technology). This novel approach builds on the simulation-enhancedfracture detection technology shown in FIG. 1 but brings unprecedentedspeed and accuracy to the invasion problem by directly solvingfunctional differential equations for the most probable state andassociated uncertainty.

[0218] The crosswell tomography method provides the resolution to imagesmall changes in seismic velocity due to changes in pore fluidsaturations such as the miscible CO₂ replacement of brine and oil.Crosswell seismic data acquisition requires that a source be placed inone well while recording seismic energy in another well. Seismictomographic reconstruction and imaging enables one to define thevelocity field and reflection image between the two wells. Typicallythree or more receiver wells are selected around the source well so thata quasi three-dimensional view of the reservoir is obtained. The firstset of observations is generally done before CO₂ injection to obtain abaseline for comparison with later time-lapse repeat observations usedto track the progress of the injected CO₂.

[0219] High-frequency crosswell seismology can also utilize bothcompressional and shear waves for delineating the porosity and fracturesystem between wells. However, time-lapse crosswell studies were made ofthe San Andres and Grayburg reservoirs in Vacuum Field at constantreservoir pressure. No significant shear-wave velocity variations werenoted indicating that changes in effective pore pressure play animportant part in the shear-wave response. On the other hand, smallchanges in compressional-wave velocity and amplitude were correlated toactual CO₂ and verified through drilling. (See FIG. 33.) Hence,crosswell seismic is recommended as the tool of choice for monitoringthe flow of CO₂.

[0220] Most reservoirs are geometrically complex and have internalcompartmentation or super-K zones; many are at stress and fluid pressureconditions that make them vulnerable to pore collapse or fractureclosure. This often leads to by-passed petroleum and reservoir damage.The present technology gives quantitative information about thesubsurface needed to address these field development and managementchallenges. The technology is a major advance over presently usedhistory matching or seismic interpretation procedures due to computerautomation and advanced algorithms. The present approach yields (1) themost probable state (spatial distribution of permeability, porosity, oilsaturation, stress, and fractures across a reservoir), (2) the optimalfuture production strategy, and (3) associated risks in thesepredictions. Thus the present approach provides a next-generation fielddevelopment and management technology. The present approach isdemonstrated in a Permian Basin field; the associated reservoirs arecomplex, ample data are available, and traditional history matching hasnot proven to be an adequate field management technology.

[0221] The capability to integrate all or some of the data noted abovegives the present approach a great advantage over presently used historymatching approaches. The unique set of three dimensional, multiplereaction, transport, mechanical process reservoir simulators makes itpossible to integrate input data. The difference between the synthetic(simulated) and observed data is used via information theory to arriveat the most probable state of a reservoir. The informationtheory/reservoir simulation software provides an assessment ofrisk/uncertainty in the present reservoir state and for future fieldmanagement. Several major advances in the present approach over classichistory matching include new computational techniques and concepts thatmake the construction of the preproduction state and associateduncertainty feasible on available hardware. The integration of a widespectrum of data types and qualities is made possible by the uniquelycomprehensive set of RTM processes implemented in the present approach.This allows the approach to integrate seismic, well log, and other datawith historical production information. The approach bringsunprecedented efficiency and risk control to the industry, helping theU.S. to achieve greater fossil fuel independence.

[0222] The present methodology differs from previous methodologies asfollows:

[0223] A self-consistent method is used to relate the degree and methodof upscaling in the reservoir simulator and in defining the spatialscale on which the most probable reservoir state is obtained.

[0224] The number of sensitivity coefficient calculations is greatlyreduced, increasing with the number (N) of grid nodes on which the mostprobable reservoir state is obtained; in contrast, the number of thesecoefficients increases as (N²) for other methods.

[0225] The core and other type of data are more directly imposed on themost probable reservoir state in our method.

[0226] The types of reaction and transport processes accounted for inthe reservoir simulators make it possible to construct an objective(error) function using synthetic seismic, well log, and production data.

[0227] The error function in the present approach decreasesmonotonically with the number of iterations assuming faster andunambiguous convergence to the most probable reservoir stated in thepresent method.

[0228] The current approach is written in a very general way so that itis not restricted to reservoir simulators with simplified physics (e.g.,streamline methods). Fully coupled multi-phase flow, fracture dynamics,formation damage, and other processes are used under the presentapproach.

[0229] In summary, the present approach brings greater efficiency,accuracy, and reliability in determining the most probable reservoirstate.

[0230] The present approach is a viable technology. FIGS. 42a, 42 b, 42c, and 42 d show a 2-D 10×10 km test case domain. FIG. 42a shows thelocations of sixteen monitoring wells (dots) and injection andproduction wells. The Figure is a map of fluid pressure related to theconfiguration of the injection and production wells and the nonuniformdistribution of permeability. Information technology computed theassumed unknown permeability distribution. This example demonstrates themultiple gridding approach. First a coarse permeability field (11×11grid in FIG. 42b) is obtained and used as an initial guess for finerresolved permeability fields (21×21 grid in FIG. 42c and 41×41 grid inFIG. 42d). This process reduces the computational effort to arrive atthe most probable permeability field since it takes only a fewiterations to solve the coarsely resolved problem. The final result inFIG. 42d is in good agreement with the actual high permeability zoneindicated by the thick line, across which the actual permeability jumpsone order of magnitude. FIGS. 37a and 37 b show another 2-D examplewhere only two permeability logs are available. Although bothpermeability logs miss the puncture in the center, the present approachresults in lower permeability at both ends of the domain and higherpermeability in the center. This example demonstrates that the core andwell log data can be directly imposed in the most probable reservoirstate in the present approach, making it-cost effective. As seen inFIGS. 43a, 43 b, and 43 c, the FDM approach can also successfullypredict the initial pressure distribution showing that productionhistory and other dynamic data can be used to reconstruct the reservoirstate. FIG. 43a shows actual distribution of pressure after 30 daysindicating locations of injection and production wells as pressuremaxima and minima. FIG. 43b shows the same territory as in FIG. 43a, butshows the values predicted by the present approach. Note the excellentagreement with FIG. 43a. FIG. 43c compares actual and predicted pressureat one of the pressure monitoring wells. FIGS. 28a and 28 b show thateven a crude discretization captures the overall reservoir shape. FIG.28a shows the actual high permeability zone, and FIG. 28b shows thatpredicted by the model for a 21×21×21 grid. The domain is 10×10×10 km.Smaller scale features in the actual permeability surface are lost onthe predicted one because of the spacing of the pressure monitoringwells and the configuration of the production/injection wells, as wouldbe expected.

[0231] A probability functional method is used to determine the mostprobable state of a reservoir or other subsurface features. The methodis generalized to arrive at a self-consistent accounting of the multiplespatial scales involved by unifying information and homogenizationtheories. It is known that to take full advantage of the approach (e.g.,to predict the spatial distribution of permeability, porosity,multi-phase flow parameters, stress, fracturing) one should embedmultiple reaction, transport, mechanical process simulators in thecomputation. A numerical technique is introduced to directly solve theinverse problem for the most probable distribution of reservoir statevariables. The method is applied to several two- and three-dimensionalreservoir delineation problems.

[0232] The state of a reservoir or other subsurface feature is generallyonly known at selected space-time points on a rather coarse scale. Yetit would be desirable to reconstruct the spatial distribution offluid/rock state across a reservoir or other system. A probabilityfunctional formalism is used to determine such fluid/rock variables asfunctions of position because the subsurface can only be determined withgreat uncertainty, that is, the method analyzes the probability of acontinuous infinity of variables needed to describe the distribution ofproperties across the system.

[0233] This is not readily accomplished without the use of models thatdescribe many fluid/rock variables. For example, a classical historymatching procedure using a single phase flow model could not be used todetermine the preproduction oil saturation across a system. As acomplete understanding of reservoir state involves the fluidsaturations, nature of the wetting, porosity, grain size and mineralogy,stress, fracture network statistics, etc., it is clear that hydrologicsimulators are needed that account for a full suite of reaction,transport, and mechanical processes. The present method is a probabilityfunctional-RTM reservoir simulator approach to the completecharacterization of a subsurface system.

[0234] The state of a reservoir involves variations in space over a widerange of length scales. As suggested in FIGS. 36a, 36 b, and 36 c, theshape and internal characteristics of a reservoir can vary on a widerange of scales including those shorter than the scale on which theobservations could resolve. For example, knowing fluid pressure at wellsseparated by 1 km could not uniquely determine variations ofpermeability on the 10 cm scale. Therefore one considers thedetermination of the most probable state among the unrestricted class ofstates that can involve variations on all spatial scales. FIG. 45suggests that the probability ρ_(k) of variations on a length scale 2π/kbecome independent of k as k→∞. Thus in a classic history matchingapproach, there is an uncountable infinity of solutions. The presentapproach seeks the most probable upscaled state consistent with thescale on which the observations are taken.

[0235] Let a reservoir be characterized by a set of variables Ψ({rightarrow over (r)}) at all points {right arrow over (r)} within the systemat a given time. For example, Ψ({right arrow over (r)}) may representthe values of porosity, grain size and mineralogy, stress, fractures,petroleum vs. water saturation, and state of wetting before productionbegan. The present method seeks the probability ρ[Ψ] that is afunctional of Ψ and, in particular, constructs it to be consistent witha set of observations O(={O₁, O₂, . . . , O_(N)}) at various pointsacross the system or at various times. In addition, assume that an RTMreservoir simulator can compute these observables given an initial stateΨ({right arrow over (r)}). Let Ω(={Ω₁, Ω₂, . . . , Ω_(N)}) be the set ofcomputed values corresponding to O. Clearly, Ω is a functional ofΨ({right arrow over (r)}).

[0236] Information theory provides a prescription for computingprobability. For the present problem, the prescription may be stated asfollows. The entropy S is defined via$S = {{- \underset{\Psi}{S}}\rho \quad \ln \quad \rho}$

[0237] where

indicates a functional integral. Normalization implies $\begin{matrix}{{\underset{\Psi}{S}\rho} = 1.} & (2)\end{matrix}$

[0238] The entropy is to be maximized subject to a set of constraintsfrom the known information. Let C₁, C₂, . . . , C_(Nc) be a set ofconstraints that depend on O and Ω and, therefore, are functionals of Ψ.Introduce two types of constraints. One group, the “error constraints,”are constructed to increase monotonically with the discrepancy between Oand Ω. A second group places bounds on the spatial resolution (thelength scale) over which the method seeks to delineate the reservoirattributes. These constraints are required for self-consistency as thereservoir simulators typically used assume a degree of upscaling imposedby a lack of short scale information and practical limits to CPU time.The constraints are functionals of Ψ(C=C[Ψ]). Impose the “information”$\begin{matrix}{{{\underset{\Psi}{S}\rho \quad C_{i}} = \Gamma_{i}},{i = 1},2,{\cdots \quad {N_{c}.}}} & (3)\end{matrix}$

[0239] Using the Lagrange multiplier method, obtain maximum entropyconsistent with equations (2 and 3) in the form $\begin{matrix}{{\ln \quad \rho} = {{{- \quad \ln}\quad \Xi} - {\sum\limits_{i = 1}^{N_{c}}{\beta_{i}{C_{i}\lbrack\Psi\rbrack}}}}} \\{\Xi = {\underset{\Psi}{S}{{\exp \left\lbrack {\sum\limits_{i = 1}^{N_{c}}{\beta_{i}C_{i}}} \right\rbrack}.}}}\end{matrix}$

[0240] The βs are Lagrange multipliers and Ξ is the normalizationconstant.

[0241] The present approach focuses on the most probable state Ψ^(m).The maximum in ρ occurs when${\sum\limits_{i = 1}^{N_{c}}{\beta_{i}\frac{\delta \quad C_{i}}{\delta \quad {\Psi_{\alpha}\left( \overset{\_}{r} \right)}}}} = 0.$

[0242] Here δ/dΨ_(α) indicates a functional derivative with respect tothe α-th fluid/rock state variable. The present method solves thesefunctional differential equations for the spatial distribution of the Nreservoir attributes Ψ₁ ^(m)({right arrow over (r)}),Ψ₂ ^(m)({rightarrow over (r)}), . . . Ψ_(N) ^(m)({right arrow over (r)}).

[0243] There are two sets of conditions necessary for the solution ofequation (4). The character of the homogenization constraints is thatthey only have an appreciable contribution when Ψ has spatial variationson a length scale smaller than that assumed to have been averaged out inthe upscaling underlying the RTM reservoir models used to construct theΨ-dependence of the Ω.

[0244] The functional dependence of the predicted values Ω[Ψ] on thespatial distribution of reservoir state Ψ({right arrow over (r)}) isdetermined by the laws of physics and chemistry that evolve the“fundamental” fluid/rock state variables Ψ. These fundamental variablesinclude

[0245] stress;

[0246] fluid composition, phases, and their intra-pore scaleconfiguration (e.g., wetting, droplet, or supra-pore scale continuousphase);

[0247] grain size, shape, packing, and mineralogy and their statisticaldistribution;

[0248] fracture network statistics; and

[0249] temperature.

[0250] With these variables, the method predicts the derivativequantities (e.g., phenomenological parameters for the RTM process laws):

[0251] permeability;

[0252] relative permeabilities, capillary pressure, and othermulti-phase parameters;

[0253] rock Theological parameters; and

[0254] thermal conductivity.

[0255] From the last one, one can, through the solution of reservoir RTMequations, determine the functionals Ω[Ψ]. Thus Ψ is considered to bethe set of fundamental variables at some reference time (e.g., justprior to petroleum production or pollutant migration). The dependence ofΩ on Ψ comes from the solution of RTM equations and the use ofphenomenological laws relating the derived quantities to the fundamentalones.

[0256] This approach uses information theory to provide a mathematicalframework for assessing risk. Information theory software is used tointegrate quantitative reservoir simulators with the available fielddata. The approach allows one to:

[0257] use field data of various types and quality;

[0258] integrate the latest advances in reservoir or basinmodeling/simulation into production planning and reserve assessment;

[0259] predict the quantitative state (distribution of porosity,permeability, stress, reserves in place) across the system;

[0260] place quantitative bounds on all uncertainties involved in thepredictions and strategies; and

[0261] carry out all the above in one automated procedure.

[0262] This technology improves the industry's ability to develop knownfields and identify new ones by use of all the available seismic, welllog, production history, and other observation data.

[0263] The present approach is a self-consistent method for finding themost probable homogenized solution by integrating multiple scaleanalysis and information theory. The self consistency is in terms oflevel of upscaling in the reservoir simulator used and the spatial scaleto which one would like to resolve the features of interest.Furthermore, the homogenization removes the great number of alternativesolutions of the inverse problem which arise at scales less than that ofthe spatial resolution of data. The great potential of the method todelineate many fluid/rock properties across a reservoir is attainedthrough the use of multiple RTM process simulators. Finally, havingembedded the computations in an overall context of information theory,the approach yields a practical method for assessing risk.

VII. Seismic and Well Log Inversion and Interpretation

[0264] Consider the use of a sonic log to determine the geothermalgradient that operated during basin evolution. To demonstrate themodel's approach, use a Basin RTM simulation run at 30° C./km as theobserved data, shown in FIG. 25a. FIG. 25b is a plot of the quadraticerror E (the sum of the squares of the difference in observed log valuesand their Basin RTM synthetic log values at a given geothermalgradient). Note the well pronounced minimum at the correct geothermalgradient. What is most encouraging is that the existence of a minimum inE vs. geothermal gradient remains even when the observed data containsrandom noise. As seen in FIG. 25b, the error has a perceivable minimumat about 30° C./km, proving the practicality of this approach inrealistic environments.

[0265] The method similarly shows promise when used to determinemultiple basin history or other variables. To illustrate this point,consider a production problem wherein the objective is to find thespatial extent of and permeability in a zone of enhanced permeabilitywithin a reservoir (the circular zone in FIG. 27a). FIG. 27a shows avertical cross-section and indicates the location of production andinjection wells represented by (−) and (+), respectively. FIG. 27b showsa 3-D depiction of the dependence of the quadratic error on the radiusof and permeability in the circular zone of enhanced permeability. Thedark “valley” of FIG. 27b is the zone of minimum error while the dark“peak” is the zone of maximum error. The model uses efficient ways offinding the global minimum of the error in the space of the basinhistory parameters.

[0266] Formulas relate the sonic, resistivity, gamma ray, and neutrallog signals to the texture (grain size, shape, packing and mineralogy,and porosity) and fluid properties (composition, intra-pore geometry,and saturation of each fluid phase). These formulas allow the creationof synthetic well logs to be used in the optimization algorithm of FIG.1.

[0267] Difficulties with seismic interpretation come from the manyfactors affecting wave velocity and attenuation:

[0268] matrix porosity and texture;

[0269] density and phases of pore- and fracture-filling fluids;

[0270] fracture length, aperture, and connectivity;

[0271] fracture orientation relative to the propagation direction;

[0272] fracture cement infilling volume, mineralogy, and texture; and

[0273] pressure and temperature.

[0274] What is needed for more accurate monitoring is a set of formulasfor these dependencies. The key to the success of this facet of thepresent method is that the pore-scale geometry of the fluids as well asthe grain size and mineralogy, porosity, and other predictions of theRTM model provide the information needed to compute the velocities andattentions at all spatial points in the 3-D domain. As the velocitiesand attentions depend on so many variables (in addition to CO₂ fluidsaturation), the present method is comprehensive enough to attainunambiguous imaging of the CO₂ plume as well as possible changes in thereservoir induced by CO₂ injection. The present method uses improvedseismic wave velocity and attenuation formulas so as to be compatiblewith the phase geometry model.

[0275] Biot's theory of wave propagation in saturated porous media hasbeen the basis of many velocity and attenuation analyses. Biot's theoryis an extension of a poroelasticity theory developed earlier. Biotpredicted the presence of two compressional and one rotational wave in aporous medium saturated by a single fluid phase. Plona was the first toexperimentally observe the second compressional wave. In the case ofmulti-phase saturated porous media, the general trend is to extendBiot's formulation developed for saturated media by replacing modelparameters with ones modified for the fluid-fluid or fluid-gas mixtures.This approach results in two compressional waves and has been shown tobe successful in predicting the first compressional and rotational wavevelocities for practical purposes. Brutsaert, who extended Biot'stheory, appears to be the first to predict three compressional waves intwo-phase saturated porous media. The third compressional wave was alsopredicted by Garg and Nayfeh and by Santos et al. Tuncay and Corapciogluderived the governing equations and constitutive relations of fracturedporous media saturated by two compressible Newtonian fluids by employingthe volume averaging technique. In the case of fractured porous media,Tuncay and Corapcioglu showed the existence of four compressional andone rotational waves. The first and third compressional waves areanalogous to the compressional waves in Biot's theory. The secondcompressional wave arises because of fractures, whereas the fourthcompressional wave is associated with the capillary pressure.

[0276] The challenge of interpreting seismic (and other remotegeophysical) images is their non-unique relation to the distribution inspace of the many factors that affect wave velocity and attenuation.However, much information about the state of a reservoir exists in theother data (production history, well logs, cores, fluid samples, surfacegeology) available to a CO₂ sequestration team. The present approach (1)minimizes interpretation errors by automating the use of all these datato estimate the most likely value of the uncertain reservoir parameters;and (2) uses information theory to assess the uncertainties (andassociated risk) in the reservoir parameters so determined.

[0277] Information theory provides an advanced seismic imageinterpretation methodology. Classical seismic image interpretation isdone using geological intuition and by discerning patterns in the datato delineate faults, formation contacts, or depositional environments.The present approach integrates the physics and chemistry in the RTMsimulator and the seismic data to interpolate between wells. Thisapproach has two advantages: (1) it provides wave properties at allspatial points within the reservoir and (2) it uses basic laws ofphysics and chemistry. This gives geoscientists a powerful tool for theanalysis of remote geophysical data.

[0278] This advanced interpretation technology is applied to remotelydetect fractures in tight reservoirs. The present method adds theimportant aspect of risk assessment and the special challenge of two andthree phase flow expected in the CO₂ sequestration problem.

[0279] A result of a simulation-enhanced seismic image interpretationapproach is seen in FIGS. 25a, 25 b, 34, and 35. FIG. 25a shows porosityand compressional seismic wave velocity as predicted by the Basin RTMprogram for a 25.9 million year simulated evolution. Such profiles ofpredicted wave velocity (and attenuation) are used to constructsynthetic seismic signals as seen in FIG. 34. Note that the two cases inFIG. 34 differ only in the geothermal gradient assumed present duringbasin evolution. FIG. 35 shows the error (the difference between thepredicted and observed signals) as a function of geothermal gradient(for illustrative purposes here, the “observed” signal is the 30° C/kmsimulation).

[0280] The error shown in FIG. 35 is computed as a quadratic measure:$E = {\sum\limits_{i = 1}^{M}{\left( {\Omega_{i} - O_{i}} \right)^{2}.}}$

[0281] Here O_(i) and Ω_(i) are members of a set of M observed andsimulated values of quantities characterizing the seismic signal(arrival times, amplitudes, or polarizations of a one, two, or threedimensional data set). The predicted attributes Ω_(i) depend on thevalues of the least well constrained reservoir parameters (such as thegeothermal gradient or overall tectonics present millions of years ago).Two different sets of Ω, O are shown in FIG. 35 that are from the samestudy but involve different seismic attributes (raw signal and acorrelation function). These examples show that the error can havemultiple minima so that (1) care should be taken to find the globalminimum and (2) one should develop the most reliable error measure.Another concern is the robustness of the method to the presence of noisein the observed seismic signal. These issues are investigated here inthe context of CO₂ sequestration.

[0282] Results of the information theory approach are shown in FIGS.27a, 27 b, 27 c, 37 a, and 37 b. FIG. 27a shows an application for acase wherein the geometry of the Super-K (anomalously high permeability)zone is constrained to be circular and information theory is used todetermine the permeability and radius of this circular zone. Thissimplified study is used to show the relationship between the reducedfunction space and a complete analysis of the full probabilitydistribution.

VIII. Information Theory for Applied Geoscience Problems

[0283] A major feature of the present method is an algorithm forcomputing the most probable reservoirs state and associated riskassessment. To quantify risk one should obtain an objective methodologyfor assigning a probability to the choice of the least well controlledvariables. The present approach is based on the information theory butdiffers from other applications in geostatistics in that the approachintegrates it with RTM simulation as follows.

[0284] The following is a description of how the present method computesthe probability of reservoir state. The starting point is theprobability ρ[Ψ] for continuous variable(s) Ψ({right arrow over (r)})specifying the spatial ({right arrow over (r)}) distribution ofproperties of the preproduction fluid/rock system. Information theory isgeneralized as follows. The entropy S is given as a type of integral ofρlnρ over all possible states Ψ({right arrow over (r)}). In the presentexample, Ψ({right arrow over (r)}) is a continuous infinity of values,one for each spatial point {right arrow over (r)}. Thus, S is a“functional integral” designated:$S = {{- \underset{\Psi}{S}}\rho \quad \ln \quad \rho}$

[0285] where

implies functional integration. In the spirit of information theory, ρis the probability functional that maximizes S subject to normalization,${\underset{\Psi}{S}\rho} = 1.$

[0286] Let O(={O₁, O₂, . . . , O_(M)}) be a set of M observations (i.e.,discretized seismic, well data, or production history information). Forsimplicity here, assume one type of data. Let Ω_(l)(l=1, 2, . . . M) bea set of values corresponding to the 0_(l) but as predicted by areservoir or other model. The Ω_(l) are functionals of the spatialdistribution of reservoir characteristics, i.e., Ω=Ω[Ψ]. Define theerror E[Ψ] via $\begin{matrix}{{E\lbrack\Psi\rbrack} = {\sum\limits_{l = 1}^{M}{\left( {{\Omega_{l}\lbrack\Psi\rbrack} - O_{l}} \right)^{2}.}}} & (5)\end{matrix}$

[0287] Constrain ρ by requiring that E have a specified ensemble averagevalue, E*, estimated from an analysis of errors in the reservoir modeland observations; thus,${\underset{\Psi}{S}{E\lbrack\Psi\rbrack}{\rho \lbrack\Psi\rbrack}} = {E^{*}.}$

[0288] Also constrain the spatial scale on which Ψ can vary. In a sense,seek the probability density ρ for an upscaled (locally spatiallyaveraged) Ψ. To do so, use a homogenization constraint denoted C₂: thelatter provides the preferred weighting of ρ towards smoother Ψ({rightarrow over (r)}) so as to make the predicted most probable stateconsistent with what was used for upscaled in the reservoir model.Introducing Lagrange multipliers β₀, β₁, β₂ gives:

ln ρ[Ψ]=−β₀−β₁ E[Ψ]−β ₂ C ₂[Ψ].

[0289] A central objective of the present approach is to compute themost probable distribution, i.e., that for which the functionalderivative δρ/δΨ({right arrow over (r)}) vanishes. This most probablestate satisfies${\frac{\delta \quad E}{\delta \quad {\Psi \left( \overset{\_}{r} \right)}} + {\lambda \frac{\delta \quad C_{2}}{\delta \quad {\Psi \left( \overset{\_}{r} \right)}}}} = 0$

[0290] where λ=β₂/β₁. The higher the spatial scale of upscaled mostprobable state sought, the larger the λ chosen. Without the λ-term andwith coarse spatial resolution of the known data, there is anuncountable number of distributions Ψ({right arrow over (r)}) thatminimize E[Ψ], i.e., for which δE/δT=0.

[0291] In this family of solutions, there are members such as suggestedin FIG. 36a or others corresponding to a short scale mosaic ofvariations in Ψ({right arrow over (r)}). Thus the inclusion of the C₂term filters the ensemble to favor smoother Ψ-distributions. This is apractical consideration as only an overall resolution of the Ψ({rightarrow over (r)}) delineation problem is usually required for petroleumE&P applications. Finally, the parameter β₀ is determined fromnormalization in terms of β₁ and β₂, whereas β₁ and β₂ follow from theconstraints from E and C₂.

[0292] Uncertainty in the most probable state can be estimated. LetΨ^(m)({right arrow over (r)}) be the most probable state of the system(i.e., a solution of equation (6)). Introduce an uncertainty measure uvia${V_{T}u^{2}} = {\underset{\Psi}{S}{\rho \lbrack\Psi\rbrack}{\int{{^{3}r}\left\{ {{\Psi \left( \overset{\rightharpoonup}{r} \right)} - {\Psi^{m}\left( \overset{\rightharpoonup}{r} \right)}} \right\}^{2}}}}$

[0293] where V_(T) is the total volume of the system. With thisdefinition, u^(½) is an RMS uncertainty in Ψ about its most probabledistribution Ψ^(m). u is expected to increase as the spatial coverageand accuracy of the observed data O degrades.

[0294] An important feature of the approach is that it can integratemultiple types of data (seismic, well logs, production history) or dataof various quality (old versus modern production history). To do so,introduce an error E_((k)) for each of N_(e) data types (k=1, 2, . . . ,N_(e)). In analogy with equation (5), write$E_{(k)} = {\sum\limits_{i = 1}^{N_{chj}}\left( {\Omega_{{(k)}i} - O_{{(k)}i}} \right)^{2}}$

[0295] where Ω_((k)i) is the i-th data of the k-th set (i=1, 2, . . . ,N_((k))). Again, one can impose the constraints${\underset{\Psi}{S}\rho \quad E_{(k)}} = E_{(k)}^{*}$

[0296] for estimated error E_((k)).

[0297] The data types (Ω_((k)), O_((k))) include production history,seismic, core analysis, and well logs. The functional dependence of theΩs on reservoir state is computed via the reservoir simulator. The mostprobable state is computed by solving the functional differentialequation (6) generalized for multiple data sets and state variables. Thecomputational algorithms, efficient evaluation of uncertainty, andparallel computing techniques make the present method a major stepforward in history matching and crosswell tomographic imageinterpretation.

[0298] An information theory approach is used to determine the mostprobable state of a reservoir and the associated uncertainty.Quantifying the state of the subsurface provides a challenge for thepetroleum industry:

[0299] available information consists of mixed data types and qualityand with different and often sparse spatial or temporal coverage;

[0300] the overall shape and location of a reservoir and its internalstate (permeability and porosity distribution and reserves in place) areoften uncertain;

[0301] there are many uncertainties about the preproduction reservoirstate; and

[0302] while there is often a great quantity of data available, theiruse in limiting the uncertain geological and engineering parameters issubject to interpretation rather than being directly usable in acomputer-automatable procedure.

IX: A Second Exemplary Application: Cell Modeling for Drug Discovery,Treatment Optimization, and Biotechnical Applications

[0303] This section presents internal details of embodiments ofCyber-Cell. As such, this section is exemplary only and is not meant torestrict the scope of the claimed invention.

[0304] A second embodiment of the invention models living cells.Cyber-Cell is an integrated cell simulation and data methodology usefulfor drug discovery and treatment optimization. Cyber-Cell uses aninformation theory framework to integrate experimental data. Throughinformation theory and the laws of chemistry and physics, Cyber-Cellautomates the development of a predictive, quantitative model of a cellbased on its DNA sequence.

[0305] Cyber-Cell accepts a DNA nucleotide sequence as input. Applyingchemical kinetic rate laws of transcription and translationpolymerization, Cyber-Cell computes the MRNA and protein populations asthey occur autonomously, in response to changes in the surroundings, orfrom injected viruses or chemical factors. Cyber-Cell uses rulesrelating amino acid sequence and function and the chemical kinetics ofpost-translational protein modification to capture the cell's autonomousbehavior. A full suite of biochemical processes (including glycolysis,the citric acid cycle, amino acid and nucleotide synthesis) areaccounted for with chemical kinetic laws.

[0306] Data input to Cyber-Cell include microscopy, genomics,proteomics, multi-dimensional spectroscopy, x-ray crystallography,thermodynamics, biochemical kinetics, and bioelectric information.Advances in genomic, proteomic, biochemical, and other techniquesprovide a wide range of types and quality of data. Cyber-Cell integratescomprehensive modeling and data into an automated procedure thatincorporates these ever-growing databases into the model development andcalibration process.

[0307] Cyber-Cell is self-sustaining. For example, mathematicalequations generate RNA from the DNA nucleotide sequence usingpolymerization kinetics and post-translational modifications. From thisRNA, Cyber-Cell generates the proteins which, through function-sequencerules, affect the metabolic processes. This closes one of the feedbackloops among the many processes underlying living cell behavior, as shownin FIG. 46. That Figure shows how DNA nucleotide sequence data are usedin a self-consistent way to generate cell reaction-transport dynamics byfeedback control and coupling of metabolic, proteomic, and genomicbiochemistry. This allows the development of a model of increasingcomprehensiveness in an automated fashion, greatly improving theefficiency of the model-building process via its information theoryapproach.

[0308] Cyber-Cell accounts for the many compartments into which a cellis divided and within each of which specialized biochemical processestake place, as suggested by FIG. 47. FIG. 47 shows some of theintracellular features that Cyber-Cell models by evolving them viamesoscopic equations solved on a hexahedral finite-element grid. Forexample, E. coli's key features include the nucleoid and ribosomes,while other prokaryotes have these features as well as the mesosome. Theintracellular features are treated with a mesoscopic reaction-transporttheory to capture atomic scale details and corrections to thermodynamicsdue to the large concentration gradients involved. Metabolic reactionsand DNA/RNA/protein synthesis take place in appropriate compartments,and active and passive molecular exchange among compartments isaccounted for. Cyber-Cell models transport and reaction dynamics thattake place in the membrane-bound organelles of eukaryotic cells.Cyber-Cell accounts for the wide separation of time scales (nanosecondsto hours) on which cellular rate processes take place, using multipletime scale techniques.

[0309] Conservation equations compute nucleotide/amino acidconcentrations, and polymerization kinetics govern the time course ofRNA synthesis. Protein polymerization kinetics are accounted for viarate phenomenologies that allow for cross-coupled control of metabolicnetworks and other processes. Bioelectrically mediated membranetransport is computed to keep track of the exchange of molecules betweenthe cell's interior and the external medium. Cyber-Cell's embeddedinformation theory framework achieves an integration of model and datafor automated cell model building and simulation. Uniqueness is acritical issue in the development of a model of a complex system—can theavailable data discriminate among models? For example, the overallreaction x+y+z→product with an observed rate proportional to theconcentration product xyz can correspond to the more likely mechanism(x+y

(xy),(xy)+z→product) and two other similar permutations. Also, severalproteomes upon tryptic digestion can yield the same MDS(multi-dimensional spectroscopy) signal/separation. Cyber-Cell'sintegration of model and data through information theory surmounts thisproblem. For example, there are (by postulate) many fewer fundamentalrules of transcription and translation than the number of types of mRNAand proteins in a cell. Cyber-Cell facilitates the use of the MDS andother data to interpret the proteome. Furthermore, as the proteome, forexample, depends on metabolism (notably amino acid production), thewealth of biochemical, membrane transport, and other data used inCyber-Cell helps to constrain the “inversion” of the spectroscopic andother data to yield a more specific identification of the proteins. Asmore and more data become available, Cyber-Cell's fully automatedprocedure develops a model of increasing accuracy and uniqueness.

[0310] To capture a wide range of cellular phenomena and to achieve anintegration with experimental data, Cyber-Cell includes a comprehensiveset of cell reaction, transport, and genomic processes. As a result,Cyber-Cell includes these features:

[0311] nonlinearity and multiple, stable, cellular states (see FIGS. 48aand 48 b);

[0312] multiple time scale (fast/slow) reaction formalism;

[0313] nonlinear dynamics of interacting local sites of reaction;

[0314] bioelectricity;

[0315] polymerization kinetics;

[0316] passive membrane transport and attendant nonlinearity;

[0317] translation and transcription polymerization chemical kinetics;and

[0318] mesoscopic structures (e.g., macromolecules, the nucleoid of aprokaryote, etc.) that are too small to treat by usual macroscopicreaction-transport theory. Their atomic scale features should beaccounted for in capturing their biochemical functionality.

[0319] As an example of cellular nonlinear phenomena, FIG. 48a showssustained oscillations in Saccharomyces cerevisiae in a continuous-flowstirred tank reactor. In FIG. 48b, Cyber-Cell demonstrates thatnonlinear rate laws may allow a cell to make a transition from a normalstate to an abnormal one without the possibility of ever returning tothe normal state no matter how the surrounding conditions are changed.

[0320] The internal complexities of a typical cellular system are shownin FIG. 47. Simplified models (e.g., of one biochemical pathway orcompartment) are not satisfactory; such subsystems are so stronglycoupled to the rest of the cell that their isolated dynamics do notyield a true picture of the multi-process, compartmentalized livingcell. Cyber-Cell's design is flexible (reactions are written withgeneral stoichiometry, rate laws can be easily modified, etc.), and ittakes advantages of advances in genomic and proteomic data andsupercomputing to grow with the expected expansion of cellulardatabases.

[0321] The metabolic kinetics and transport features of Cyber-Cell (seeFIG. 46) have been tested on Trypanosoma brucei. T. brucei rhodesienseand T. brucei gambienese are the parasites responsible for sleepingsickness in humans, and T. brucei causes Nagana in domestic animals.FIG. 49a shows T. brucei's “long and slender” form with a longflagellum. The single mitochondrion is forced in a peripheral canal withalmost no cristae; there are no cytochromes, and the citric acid cycledoes not function. In FIG. 49b, T. brucei is in its “stumpy” form withan expanded mitochondrial canal. The mitochondrion participates in cellmetabolism. Shown in FIG. 49c are Cyber-Cell predicted concentrations ofsome of the chemical species within the glycosome as a function of timefor a transient experiment. FIG. 50 compares the predicted results withobserved steady state values: column one shows measured concentrations,column two shows Cyber-Cell's simulation of the same system.

[0322] Cyber-Cell's RNA polymerization kinetics have also been tested.The T7 family of DNA-dependent RNA polymerases represents an idealsystem for the study of fundamental aspects of transcription because ofits simplicity: T7 RNA polymerases do not require any helper proteinsand exist as single subunits. These single-subunit RNA polymerases arehighly specific for an approximately twenty base pair, nonsymmetricpromoter sequence. One major transcript GGGAA and five other mistakesare seen in FIGS. 51a and 51 b. The mistakes arise from misinitiation orpremature termination. The polymerization model implemented inCyber-Cell accounts for these mistakes, and its results compare wellwith experimental data. FIG. 51a shows transcription by a bacteriophageT7 RNA polymerase system inserted in E. coli. This Cyber-Cell simulationagrees with the experimental results shown in FIG. 51b. In the latterFigure, experimental data are shown on in vitro RNA synthesis showingthe sequencing and strand length after ten minutes of evolution. The T7RNA polymerase system is a test case that demonstrates the validity ofCyber-Cell's mathematics and is not used to calibrate transcription.Another Cyber-Cell simulation is seen in FIG. 52, where HIV-1transcription of the Philadelphia strain is considered. The number oftranscribed strands of various length intervals are shown as a functionof time. Strand set one is the sum of nucleotides from length 1 to 1000,set two is for strands of length 1001 to 2000, and so on.

[0323] In some embodiments, Cyber-Cell runs in four modes:

[0324] a model building/calibration mode wherein model parameters aredetermined using experimental data of a variety of types (FIG. 53a);

[0325] a probability functional mode for estimating the most probabletime-course of key species whose mechanisms of production or destructionare not known;

[0326] a mode wherein estimated Cyber-Cell input or output data areassigned uncertainties; and

[0327] a mode to aid an investigator in designing experiments to reducethe uncertainties in model parameters.

[0328] Cyber-Cell divides the system to be modeled into N_(c)compartments labeled α=1, 2, . . . N_(c). There are N molecular specieslabeled i=1, 2, . . . , N of concentrations c_(i) ^(α)(t) at time t.Conservation of mass for species i in compartment α implies$\begin{matrix}\left( \left. {{V^{\alpha}\frac{c_{i}^{\alpha}}{t}} = {{\sum\limits_{\alpha^{\prime} \neq \alpha}{A^{{\alpha\alpha}^{\prime}}h_{i}^{{\alpha\alpha}^{\prime}}E_{i}^{{\alpha\alpha}^{\prime}}}} + J_{i}^{{\alpha\alpha}^{\prime}} + {v^{\alpha}{Rxn}}}} \right) \right)_{i}^{\alpha} & (7)\end{matrix}$

[0329] where

[0330] h_(i) ^(αα′)=permeativity of species i between compartments α andα′;

[0331] E_(i) ^(αα′)=factor which, at exchange equilibrium for passivetransport between compartments α and α′ for species i, is zero;

[0332] J_(i) ^(αα′)=net rate of active transport of species i fromcompartment α′ to α;

[0333] A^(αα′)=surface area between compartments α and α′;

[0334] V^(α)=volume of compartment α;

[0335] Rxn)_(i) ^(α)=net rate of reaction in compartment α for species i(moles/volume-time); and

[0336] V_(τi) ^(α)=stoichiometric coefficient for species i in reactionτ in compartment α. For eukaryotes, the h parameters are fluxcoefficients for transfer of species across membrane-bound organelles.For prokaryotes, the h parameters are permeativities associated with thesurroundings, while for the internal compartments (e.g., nucleoid,mesosome) they serve as rate coefficients for molecular exchange withthe cytosol. However, Cyber-Cell optionally treats internal dynamics ofinternal compartments, such as the nucleoid, using mesoscopic equations.Coulomb forces impose charge neutrality within each compartment; ifz_(i) and c_(i) ^(α) are the valence and concentration of species i incompartment α, respectively, then${\sum\limits_{i = 1}^{N}{z_{i}c_{i}^{\alpha}}} = 0.$

[0337] Formulas for the activity of species i in each compartment andthe rate laws for transport across the membranes complete the model,yielding electrical potential and concentration in each compartment.Biochemical reactions proceed on a wide range of time scales (fromnanoseconds to days). Thus, for practical and conceptual reasons,Cyber-Cell divides reactions into fast and slow groups. With this, thereaction term in equation (7) is rewritten$\left. ({Rxn}) \right)_{i}^{\alpha} = {{\sum\limits_{k = 1}^{N^{\alpha \quad f}}{v_{ki}^{\alpha \quad f}\frac{W_{k}^{\alpha \quad f}}{ɛ}}} + {\sum\limits_{k = 1}^{N^{\alpha \quad s}}{v_{ki}^{\alpha,\quad s}W_{k}^{\alpha,s}}}}$

[0338] where the smallness parameter ε<<1 emphasizes the large ratecoefficients of the fast reactions relative to those of the slow ones.Using the equilibrium submanifold projection technique, such rateproblems are solved in the limit ε→0. The generality of this approachallows for the automated creation of reactions, and thereby informationtheory is used to guide the model building effort of Cyber-Cell.

[0339] Cyber-Cell accounts for the interplay between the molecular scale(at which information is stored and molecular function is determined)and the macroscopic scale of metabolite balance. To do this, Cyber-Cellreads and transfers nucleotide and amino acid sequences through apolymerization kinetic model. Thereby Cyber-Cell utilizes the growinggenomic and proteomic databases for model development, calibration, andsimulation of cell behavior. This is illustrated by considering thekinetics of RNA and protein synthesis. (See FIG. 54.) Key aspects of thesynthesis of these macromolecules are the role of a template molecule(e.g., mRNA for proteins) and the mediation by enzymes in controllingthe biopolymerization. Cyber-Cell uses a chemical kinetic formalism tocapture effects of DNA/RNA/protein synthesis. In order to complete thecoupling of these syntheses to the rest of the cell processes,Cyber-Cell uses relations between sequence and function as they becomeknown in the art.

[0340]FIG. 54 illustrates the need for Cyber-Cell's complexpolymerization chemical kinetics. In the Figure, a polymerase or editingsystem (performing read, write, or edit (RWE) functions) accepts atemplating DNA/RNA strand and produces a new strand (DNA, RNA, orprotein). The RWE complex binds to the template and advances along thetemplating strand, reading its information in search of the initiationsequence where the R WE forms a closed complex on the promoter sequence.An isomerization occurs whereby an open complex is formed.Polymerization takes place where the appropriate nucleotide sequence islaid according to the DNA sequence for the seven to twelve area basepairs or the DNA strand that the enzyme covers. Auxiliary molecules maycomplex with an RWE unit to modify its kinetics (i.e., rules of readingthe templating strand to decide on initiation, elongation, andtermination). The σ-subunit of the enzyme must detach in order for theenzyme to have a strong affinity for nonspecific DNA. If the σ-subunitdoes not detach, abortive mRNAs are created, otherwise elongationoccurs. Some RWE complexes can read the new strand and edit it bydeletion or addition processes. Finally, end units can be added to thenew strand in a process mediated by an RWE. A given cell may haveseveral types of RWEs.

[0341] The essential chemical species is a complex of an RWE unit withthe templating and new strands. To characterize this complex, Cyber-Cellkeeps track of the location n on the template strand being read and thepresence or absence of any auxiliary factors. Cyber-Cell also accountsfor the complexing to an add-unit ω (amino acids for proteins andnucleotides for DNA or RNA). Example Cyber-Cell reactions formulated tocapture the aforementioned processes are as follows:Enz + DNA(gene) ⇔ (E ⋅ DNA)_(gene, n)^(closed)(E ⋅ DNA)_(gene, n)^(closed) ⇔ (E ⋅ DNA)_(gene, n)^(closed)(E ⋅ DNA)_(gene, n)^(open) + ntp ⇔ (E ⋅ DNA ⋅ RNA)_(gene, n + 1)^(open)(E ⋅ DNA ⋅ RNA)_(gene, p)^(open) + ntp ⇔ (E ⋅ DNA ⋅ RNA)_(gene, p + 1)^(open)where  p = n + 1, n + 6(E ⋅ DNA ⋅ RNA)_(gene, n + 6)^(open) + ntp− > (E ⋅ DNA ⋅ RNA ⋅ no  σ)_(gene, n + 7)^(open)(E ⋅ DNA ⋅ RNA ⋅ no  σ)_(gene, m)^(open) + ntp ⇔ (E ⋅ DNA ⋅ RNA ⋅ no  σ)_(gene, m + 1)^(open)where  m = n + 7, N − 1(E ⋅ DNA ⋅ RNA ⋅ no  σ)_(gene, N)^(open)− > Enz + DNA(gene) + RNA(gene).

[0342] The process starts at the promoter region.

[0343] Cyber-Cell's formalism captures the biochemical control of thecellular system. For example, complexing with an auxiliary molecule maymake one pathway possible (e.g., location of initiation or termination,nature of editing) while another auxiliary factor or set of complexedfactors may favor another pathway. The above approach is used formodeling E. coli, the in vitro T7 RNA polymerase (FIGS. 51a and 51 b),and the HIV (FIG. 52). In the HIV case, the full length HIV RNA strandsare templated from HIV DNA inserted in a host helper T-cell. Thesefeatures can be tested using data from E. coli, Saccharomycescerevisiae, and their subsystems, for which laboratory kinetics data areavailable. Such test systems serve to calibrate the parameters (chemicalrate, transport, etc.) in Cyber-Cell as values for those systems orpreliminary values for analogous systems. The information theory shellprogram in Cyber-Cell greatly facilitates the use of a variety ofgenomic and proteomic data to carry out this calibration.

[0344] Intracellular mesoscopic structures (e.g., the nucleoid, globulesand bubbles, ribosomes) should not be treated using the macroscopicreaction-transport theory as described above. Free energy-minimizingstructures are often not global minima, but are rather functioningentities that are local minima lying close to the global minimum.

[0345] Cyber-Cell models simple and multi-phase liquid droplets immersedin a host medium. Composite structures of multiple macromolecules areanalyzed via a global coordinate approach. Micelles, nucleoids,ribosomes, and other mesoscopic objects made of a shell of molecules cantake on morphologies dictated by the number and shape of theshell-forming molecules and their distribution over the shell. Thefollowing is a formalism for determining the relationship between thecomposition and the shape of these mesoscopic objects.

[0346] Consider a body surrounded by a shell of N molecular types i=1,2, . . . , N. Let σ_(i) be the number of molecules of type i per surfacearea. FIG. 55a suggests the morphology of mesoscopic objects consistingof an interior medium (S<0) surrounded by a bounding surface (S=0)immersed in an external medium (S>0). The morphology results from thecoupling of the curvature of the shell (S=0) and the distribution ofmolecules of various types within the shell. The objective is toconstruct the free energy functional F[σ, S] and delineate the freeenergy-minimizing structures it implies. First write the free energy asan integral of the free energy density f(σ, τ) over the surface S=0:F = ∫_(S = 0)²rf.

[0347] Through the curvature tensor, τ, of the domain of integration, Fdepends on the shape function S. As a first approximation, f can bewritten as $\begin{matrix}{f = \quad {{f^{cl}\left( \underset{\_}{\sigma} \right)} + {\frac{1}{2}{\sum\limits_{\alpha_{1},\alpha_{2},\alpha_{3},{\alpha_{4} = 1}}^{3}{\Gamma_{\alpha_{1}\alpha_{2}\alpha_{3}\alpha_{4}}{\overset{\sim}{\kappa}}_{\alpha_{1}\alpha_{2}}{\overset{\sim}{\kappa}}_{\alpha_{3}\alpha_{4}}}}} +}} \\{\quad {{\frac{1}{2}{\sum\limits_{i,{j = 1}}^{N}{\Lambda_{ij}{{\overset{\rightarrow}{\nabla}\sigma_{i}} \cdot {\overset{\rightarrow}{\nabla}\sigma_{j}}}}}},}}\end{matrix}$

[0348] where {tilde over (τ)} is τ minus a σ-dependent reference valuethat incorporates the effect of molecular shape. In FIG. 55b, theindented area is induced by the presence of one type of molecule (darkarea) that reflects the sign and magnitude of the preferred radius ofcurvature associated with the dark vs. the light molecules. Theenergy-minimizing structures are the solution of the followingequations:${\frac{\delta \quad F}{{\delta\sigma}_{i}} = \quad {\overset{\_}{\mu}}_{i}}\quad$$\frac{\delta \quad F}{\delta \quad S} = {\sum\limits_{i = l}^{N}{{\overset{\_}{\mu}}_{i}\frac{\delta}{\delta \quad S}{\int_{s = 0}{{^{2}r}\quad \sigma_{i}}}}}$

[0349] for Lagrange multiplier {overscore (μ)}_(i).

[0350] Macromolecules may aggregate into ribosomes, nucleoids, or othermesostructures. Also, the escape of RNA from and the import ofnucleotides into the nucleoid, with its maze of DNA and other molecules,occurs in a geometrically restricted and crowded environment. These andother key biochemical processes typically take place without alteringthe bonding relations among the constituent atoms. Thus although localstructure may only change slightly, the cumulative effect is a largedeformation or assembly of the mesostructure. Cyber-Cell generalizes thecollective coordinate method for use in the efficient computing of thestable structures of these macromolecular assemblages. To illustratethis approach, consider the assembly of a complex structure from itsconstituent macromolecules (e.g., proteins or RNA). The challenge inconstructing a theory of these objects is that the essence of theirbehavior may involve both their overall morphology and the atomicstructure underlying their chemical reactivity.

[0351] Cyber-Cell computes the assembly of a free energy minimizingstructure from a given initial configuration of the molecules.Self-assembly is dictated by the cumulative effect of atomic forces. Tostart, introduce a set of collective coordinates Γ ^((m)) for each ofthe M constituent macromolecules m=1, 2, . . . , M. As the interatomicforces induce an interaction between these constituents, the equationsyielding the overall free energy minimizing structure form a set ofcoupled equations for these collective coordinates. This approachpreserves the atomic scale detail while attaining great computationalefficiency.

[0352] For each macromolecule m=1, 2, . . . , M, a space-warpingtransformation is introduced via${{\overset{\rightarrow}{r}}^{\prime} = {\sum\limits_{n}{\Gamma_{n}^{(m)}{{\overset{\rightarrow}{f}}_{n}\left( \overset{\rightarrow}{r} \right)}}}},{m = 1},2,\quad {{\dddot{}}\quad {M.}}$

[0353] This transformation takes a point {right arrow over (r)} to a newpoint {right arrow over (r)}′. The atomic coordinates of the m-thmacromolecule move via a change in the Γ^((m))s so as to minimize thefree energy F^(tot) of the M macromolecular assemblage. Let F beF^(tot), except with the atomic coordinates of each macromoleculerelated to its Γs and to a set of reference coordinates that areindicated with a superscript “0,” i.e., {right arrow over(r)}_(i)={right arrow over (r)}_(i)(Γ^((m)), {right arrow over (r)}^(o). . . {right arrow over (r)}_(N) ^(o)), where the molecule has N_(m)atoms. Then Γ^((m)) is determined as the solution of${\frac{\Gamma_{n}^{(m)}}{\tau} = {- \frac{\partial F}{\partial\Gamma_{n}^{(m)}}}},{m = 1},2,\quad {{\dddot{}}\quad {M.}}$

[0354] These equations are solved until the rate of change of the Γs isreduced appreciably, and then a similar procedure is used for the atomiccoordinates via a solution of d{right arrow over (r)}_(i)^((m))/dτ=−∂F^(tot)/∂{right arrow over (r)}_(i) ^((m)). This Γ/γ cycleis repeated until F^(tot) is minimized. Finally, the above procedure canbe generalized for the solution of Newton's equation for carrying outefficient molecular dynamics simulations

[0355] The benefit of Cyber-Cell's procedure is that changes in the Γsallow for overall translation, rotation, bending, and twisting of eachmacromolecule as the macromolecules organize to form the free energyminimizing assembly. Massive computations based on direct atomicsimulation are unfeasible while the present approach yields results onavailable computer hardware.

[0356] Many of the equations describing mesoscopic cellular subsystemscan be solved using numerical methods. The descriptive variables, eitheron a surface or in a 3-D volume, are solved by finite elementtechniques. A key problem in many cases is the need to constrain theminimization due to mass conservation or other conditions.

[0357] Mesostructures, such as the nucleoid, interact with the cytoplasmand other intracellular features through an exchange of molecules. Thisexchange takes place across a surface defining the nucleoid region. Asimple model of a subcellular body assumes that the configuration of thebody's macromolecules rapidly adjusts to the internal medium but thatthe latter is controlled by the kinetics of exchange with thesurroundings across a boundary surface. A schematic view of such a modelsystem is suggested in FIG. 47. The overall dynamics of the model can bequite dramatic as the response of the macromolecules can be nonlinearlyrelated to the internal compositional state. The free energy of thecompartment, F^(tot), is assumed to be given by

F ^(tot) =U(ξ)+∫d ³ rf

[0358] including entropic effects from internal vibrations plus a termfrom the interaction of any membranes. Hence U is a functional ofmembrane shape.

[0359] In the quasi-equilibrium model, F^(tot) is minimized with respectto ξ and the distribution of composition c(={c₁, c₂, . . . , c_(N)}) ofthe N continuum molecular species in the mesoscopic compartment. Ifn_(i) ^(tot) is the total number of moles of species i in thecompartment, then F^(tot) is to be minimized with respect to c for agiven n ^(tot). Thus one has${\mu_{i} \equiv \frac{\delta \quad F^{{Tot}.}}{\delta \quad c_{i}}} = {\overset{\_}{\mu}}_{i}$$\frac{\partial U^{*}}{\partial\xi_{\alpha}} = 0.$

[0360] The effective potential U* is defined via

U*=U−∫d ³ rp

[0361] for pressure p=$p = {{\sum\limits_{i = 1}^{N}{c_{i}\mu_{i}}} - {f.}}$

[0362] The constants {overscore (u)}_(i) can be determined via a penaltymethod.

[0363] The time course of n ^(tot) is determined from the exchange withthe surroundings. Let J_(i) be the net influx of component i into thecompartments. Assuming that J_(i) depends on c and c ⁰ (theconcentrations in the surroundings), and possibly on the electricalpotentials V and V⁰ as well, net conservation of mass yields$\frac{n_{i}^{tot}}{t} = {\int{^{2}{{rJ}_{i}\left( {\underset{\_}{c},{\underset{\_}{c}}^{0},V,V^{0}} \right)}}}$

[0364] where the integral is over the compartment surface just insidethe compartment. Thus if c ⁰ and V⁰ are known, this quasi-equilibriummodel gives the coupled dynamics of mass exchange with the surroundingsand the free energy minimizing internal state.

[0365] In the nucleoid, the dense packing of macromolecules can greatlyslow down the migration of molecules. Thus, the assumption ofdiffusional equilibrium as used above may break down. In these cases,Cyber-Cell's intracompartmental dynamics are augmented withtime-dependent mesoscopic reaction-transport equations.

X. Data for Cell Modeling

[0366] Cyber-Cell integrates a variety of data types and qualities intoits model development and calibration process. Thus, up-to-dateknowledge of the types of data available is of paramount importance. Asseen from FIG. 53a, data are divided into seven categories. Biochemicalkinetic and thermodynamic data are needed for modeling transcription,translation, and metabolic processes. Examples of this type of datainclude enzyme affinity for a substrate, equilibrium constants, reactionrates, Gibbs free energy, and entropy values. Advances in analyticalbiochemical spectroscopy, microscopy, chromatography, andelectrophoresis provide a wealth of knowledge related to thephysico-chemical dynamics of cells. Techniques such as dynamic lightscattering spectroscopy, matrix-assisted laser desorption/ionizationmass spectroscopy, multidimensional HPLC-IMS-MS, NMR spectroscopy,UV/Visible spectroscopy, and SDS-PAGE electrophoresis allow biologiststo gain extensive knowledge of the composition, function, andconformation of proteins and produce data usable by Cyber-Cell.

[0367] For simulations of prokaryotic systems, the wealth ofphysiological, metabolic, genetic, proteomic, and x-ray crystallographydata currently available on E. coli make it ideal for whole celltesting. The E. coli genome has been extensively and comprehensivelystudied, and the current explosion of E. coli proteomic studies has ledto creation of many proteomic and genomic web-based databases availablefrom a variety of institutions around the world. For example, some ofthe more noteworthy are presented in the following Table. Much of thesame type of information is available, and from some of the samesources, for the yeast cell Saccharomyces cerevisiae, making it idealfor whole cell eukaryotic testing. TABLE Database Name Web AddressComment Kyoto Encyclopedia of Genes and www.genome.ad.jp/kegg/ Genomics,Genomes (KEGG), Institute for Chemical Proteomics, Enzyme Research,Kyoto University Kinetics What is There? (WIT), Argonne Nationalwit.mcs.anl.gov/WIT/ Genomics, Enzyme Laboratory, USA Kinetics BRENDA:The Enzyme Database, srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?- EnzymeKinetics European Bioinformatics Institute,page+LibInfo+-id+66MWY1G5_nt+- Hinxton, UK lib+BRENDA Regulon DB: Adatabase on tula.cifn.unam.mx:8850/regulondb/regulon Genomicstranscriptional regulation of E. coli, _intro.frameset Laboratory ofComparative Biology, Universidad Nacional Autonoma de Mexico E. coliDatabase Collection (ECDC), susi.bio.uni-giessen.de/ecdc/ecdc.htmlGenomics, Institut f{umlaut over (ur)} Milro- und Molekularbiologie,Proteomics Giesser, Germany E. coli Stock Center Database, Yalecgsc.biology.yale.edu/cgsc.html Genomics, University, USA ProteomicsNCBI Entrez: E. coli k-12 Complete www.ncbi.nlm.nih.gov/cgi- GenomicsGenome, National Institutes of Health,bin/Entrez/framik?db=genome&gi=115 USA MetalGen: A graphic-orienteddatabase ftp://pasteur.fr/pub/GenomeDB Genomics, Enzyme which linksmetabolism to the genome of Kinetics E. coli, Institute Pasteur, FranceColibri: A complete dataset of DNA and genolist.pasteur.fr/ColibriGenomics, protein sequences derived from E. coli k- Proteomics 12 andlinked to relevant annotations and functional assignments, InstitutePasteur, France EcoGene: Database of genes, proteins, andbmb.med.miami.edu/Ecogene/EcoWeb Genomics, intergenic regions of E. colik-12, Proteomics University of Miami, USA E. coli Strain Database ofNational www.shigen.nig.ac.jp/ecoli/strain/ Genomics Institute ofGenetics, Japan Genobase 3.0, Nara Institute of Sciencee.coli.aist-nara.ac.jp Genomics and Technology, Japan E. coli GenomeProject, University of www.genome.wisc.edu Genomics Wisconsin atMadison, USA Profiling of E. coli Chromosome (PEC),www.shigen.nig.ac.jp/ecoli/pec Genomics National Institute of Genetics,Japan EcoCyc: Encyclopedia of E. coli Genesecocyc.PangeaSystems.com/ecocyc/ecocyc. Genomics, and Metabolism htmlProteomics, Enzyme Kinetics GenProtEC: E. coli Genome andgenprotec.mbl.edu/start Genomics, Proteomic Database, Marine BiologyProteomics Laboratory, Woods Hole, MA, USA Express DB RNA ExpressionDatabase, arep.med.harvard.edu/cgi- Genomics, Lipper Center forComparative Genetics, bin/ExpressDBecoli/EXDstart Proteomics HarvardUniversity, USA

XI. Information Theory and Cell Model Data Integration

[0368] Cyber-Cell resolves gaps in the understanding of many cellprocesses via its information theory approach. This leads to acomputational algorithm for simultaneously using data of various typesand qualities to constrain the ensemble of possible processes and rateparameters. A probability functional method is used to account for thetime-dependence of the concentrations of chemical species whosemechanisms of production or destruction are not known but whoseenzymatic or other role is known.

[0369] Cyber-Cell can be calibrated when some of its processes are notwell understood (e.g., post-translation chemical kinetics network andrate laws). Cyber-Cell addresses the dilemma of calibrating or running amodel that is incomplete, a situation which should be faced in any cellmodeling effort. For example, cell extract or other in vitro experimentsare known to yield different rate parameters than those in the completecell-seemingly implying the need for a complete model before calibrationcan commence. However, by its information theory method, Cyber-Cellpredicts the most probable time course of enzymes or other factors thatplay a key role, but whose mechanisms of production or destruction arenot known. Cell response data are used to predict the most probable timecourse of these factors by solving functional differential equationsderived using information theory. In this way, information theory withCyber-Cell calibrates rate parameters for reactions in which an enzymetakes part even though the origins of that enzyme are poorly understood.

[0370] Cyber-Cell's overall data and modeling integration scheme isportrayed in FIG. 53a. The Figure summarizes the richness of data typesavailable for E. coli and yeast that Cyber-Cell integrates. FIG. 53bdetails an exemplary information theory methodology that automatesCyber-Cell model building and calibration processes. Cyber-Cell isintegrated with a variety of data to compute the most probable values ofthe least well constrained model parameters via the information theorymethod. The method also yields the most probable time-course of theconcentrations of key chemical species whose origins are not known. Thecomputation involves execution of many Cyber-Cell simulations that canbe run in parallel. For example, in FIG. 53b, the Cyber-Cell predictedproteome is processed via a synthetic tryptic digest and experimentallycalibrated fragment flight time and drift time relationships.Information theory is used to compare Cyber-Cell's predicted MDS datawith observed MDS data and to integrate observed data and comprehensivereaction-transport-mechanical modeling. A similar approach is used forother data types.

[0371] The Cyber-Cell model is calibrated against known results. Manycalibration problems are formulated as A x=y, where y is a vector ofobserved quantities, and x is the vector of parameters needed for themodel. The matrix A usually depends on x. Because the problem is usuallyill-posed, A is ill-conditioned. The error E equals ∥A x−y∥², aquadratic to be minimized with respect to x. A number of techniques havebeen proposed to regularize such systems. Tikhonov's approach introducesa small regularization parameter λ to modify E to equal ∥Ax x−y∥²+λ∥x∥².Regularization is achieved by minimizing this function with respect tox. However, the selection of the regularization parameter λsignificantly affects the inversion. This technique is equivalent to theminimization of E subject to the constraint ∥x∥²=f through the use ofthe Lagrange multiplier technique. Minimization of the modified -errordamps the large oscillations in the least-squares solution. TheLevenberg-Marquardt technique uses a full Newton approach and introducesanother regularization parameter that is added to the diagonal of theJacobian matrix. Once again, the choice of the regularization parameteris difficult, and the usual practice is to change it as the simulationprogresses so as to minimize its effect. In practice, multipleregularization techniques are employed simultaneously.

[0372] In Cyber-Cell, information and homogenization theories areunified into a technique that accounts for multiple scales (spatial andtemporal) in the problem of interest. This provides a physicallymotivated regularization technique and allows the control ofregularization parameters with physical arguments. While previoustechniques assume that regularization and a posteriori analysis of theresults are independent, Cyber-Cell's information theory-based approachintegrates multiple types and qualities of observed data andregularization techniques and quantifies the uncertainty in the results.

[0373] Cell models involve poorly constrained factors that should beestimated if progress is to be made. Cyber-Cell uses a probabilisticapproach based on a new formulation of information theory to estimatethese factors. The three types of factors in this approach are asfollows:

[0374] A Discrete Parameters (e.g., the stoichiometric coefficients thatspecify the numbers of each molecular species participating in a givenreaction or parameters determining protein sequence→function rules);

[0375] B Continuous Parameters (e.g., reaction rate coefficients,membrane transport parameters, equilibrium constants; they can reside ina continuous range); and

[0376] C Functions (e.g., the time-course of the concentration ofchemical species whose role is known, such as an enzyme, but whosemechanisms of creation and destruction are not known).

[0377] To estimate the most probable values of types A and B and thetime-course of type C, Cyber-Cell uses a method that surmounts thelimitations of regularization techniques used in past approaches. To doso, Cyber-Cell introduces the probability p(Γ), (Γ=A, B, C). Perhaps themost dramatic aspect of this approach is a differential equation for themost probable time-course of the C-factors.

[0378] Normalization of the probability p(Γ) implies $\begin{matrix}{{{\underset{\Gamma}{S}\rho} = 1}\quad} & (8)\end{matrix}$

[0379] where

implies a sum over the discrete variables A, an integration over B, anda functional integration over C. To apply this, divide experiments intoN_(e) types labeled k=1, 2, . . . , N_(e), for each of which there is aset of data values O^((k)). For example, O^((l)) could be thetime-course of a set of intracellular constituents as they change inresponse to an injected chemical disturbance, O⁽²⁾ can be the normalproteome, O⁽3) can be the proteome of a virally infected cell, and O⁽⁴⁾can be a set of membrane potentials in a rest state or as they change inresponse to an electrode-imposed disturbance. Through Cyber-Cell,compute a set of values Ω^((k))(Γ) of predicted data. As Cyber-Cellpredictions depend on the choice of Γ, so do the values of the Ω^((k)).Define the k-th type error by:$E^{(k)} = {\sum\limits_{i = 1}^{N^{(k)}}{\left( ~{{\Omega_{i}^{(k)}(\Gamma)} - O_{i}^{(k)}} \right)^{2}.}}$

[0380] Typically, however, data are only indirectly related to the modelparameters Γ. The power of this method is that even very indirect data(e.g., membrane potentials) can be used to find the most probable valueof Γ (e.g., the rate coefficient for a metabolic reaction).

[0381] The entropy S of information theory is a measure of the overalluncertainty about the value of Γ; it is defined via$S = {{- \underset{\Gamma}{S}}{{\rho ln\rho}.}}$

[0382] In the spirit of information theory, ρ is the probability thatmaximizes S subject to the normalization equation (8) and the availabledata. Among the latter are the error conditions $\begin{matrix}{{\underset{\Gamma}{S}\rho \quad E^{(k)}} = E^{{(k)}*}} & (9)\end{matrix}$

[0383] where E^((k)) is the value of E^((k)) as estimated fromexperimental data error analysis and errors in the numerical techniquesin Cyber-Cell.

[0384] It is necessary to apply regularization constraints on the time(t) dependence of the continuous variables C(t). For example, assumethat estimates based on known reactions suggest that C varies on asecond timescale or longer, not, say, on a nanosecond scale. Then,impose a constraint on the expected rate of change of C: $\begin{matrix}{{\underset{\Gamma}{S}\rho {\int_{0}^{i_{f}}{{t\left( \frac{\partial C_{J}}{\partial t} \right)}^{2}}}} = {t_{f}X_{J}}} & (10)\end{matrix}$

[0385] for the j-th time-dependent parameter C_(j); the value of X_(j)represents the typical value of the square of the rate of change ofC_(j) averaged over the ensemble and the total time (t_(f)) of theexperiment.

[0386] Introducing Lagrange multipliers β_(k) and Λ_(j), shows that theρ that maximizes S subject to equations (8 and 10) takes the form${\ln \quad \rho} = {{{- \ln}\quad Q} - {\frac{1}{2}{\sum\limits_{j = 1}^{M}{\int_{0}^{i_{f}}{{t}\quad \left( {\Lambda_{J}\left( \frac{\partial C_{J}}{\partial t} \right)} \right)^{2}}}}} - {\sum\limits_{k = 1}^{N_{e}}{\beta_{k}{{E^{(k)}(\Gamma)}.}}}}$

[0387] The factor Q is a constant to be determined by imposing theconstraints of equation (8). The most probable value of Γ is that whichmaximizes ρ. For A this follows from a discrete search; for B(=B₁, B₂, .. . , B_(N) _(b) ) and C(=C₁, C₂, . . . , C_(N) _(c) ) solve$\begin{matrix}{{{{\sum\limits_{k = 1}^{N_{e}}{\beta_{k}\frac{\partial E^{(k)}}{\partial B_{J}}}} = 0},{i = 1},2,\quad {\dddot{}}\quad,N_{b}}{and}{{{{\Lambda_{J}\frac{\partial^{2}C_{J}}{\partial t^{2}}} + {\sum\limits_{k = 1}^{N_{\Gamma}}{\beta_{k}\delta \quad \frac{E^{(k)}}{\delta \quad C_{J}}}}} = 0},{j = 1},2,\quad {\dddot{}}\quad,{N_{c}.}}} & (11)\end{matrix}$

[0388] and

[0389] Equation (11) is a time-differential equation which hassimilarities in its behavior to a steady state diffusion equation in thetime dimension t. In analogy to ordinary derivatives, the functionalderivatives δE^((k))/δC_(j) measure the degree to which E^((K)) changeswhen the form of the function C_(j)(t) changes by an infinitesimalamount. As the Λ-parameters get larger, the Cs become smoother functionsof time. The values of the β and Λ parameters are determined in thisprocedure via the imposition of equations (9 and 10). This computationis implemented by assuming that ρ(F) is narrowly peaked about the mostprobable value of Γ.

[0390] A simple reaction model illustrates this approach. The modelinvolves three species X, Y, and C that are known to participate in thereactions

X+Y→2X

2X→products

[0391] 2Y→products

[0392] C+X→products

C+Y→2Y.  (12)

[0393] For this example, assume that all the reactions creating ordestroying X and Y are known, but that those affecting the catalyst Care not. Consider now the challenge of determining the catalystconcentration time-course C(t) given limited or noisy data on X(t) at aset of discrete times (but not Y(t)). Assume also that C is known at t=0and at the final time t_(f) (5 minutes). In order to test this approach,let

C(t)=e ^(−|sin(ωt)|)

[0394] and then generate X(t) via the numerical solution of mass actionrate laws for the mechanism of equation (12). Call this solution the“observed data”; various levels of noise are added to evaluate theeffect of uncertainty in the data.

[0395]FIGS. 56a and 56 b compare results for various levels of noise inthe experimental data. FIG. 56a shows the effect of 0.3% noise in theobserved data X(t) on the solution. In the absence of regularization,high frequency oscillations are amplified significantly even when thereis a small amount of noise in the observed data. In contrast, FIG. 56bshows that even when the level of noise is increased significantly (2%and 3% for thin solid and dashed lines, respectively), regularizationyields satisfactory results. The physically-motivated regularizationequation (10) increases the allowable noise in the experimental data byan order of magnitude. As this method is based on an objectiveprobability analysis, it provides the uncertainty in thepredictions—see, for example, FIG. 57 (showing the root mean squaredeviation of C(t) (dashed lines) for E*=0.001).

[0396] This approach yields accurate results even with limited and noisydata, a situation typical for experimental cell data. The method evenworks for highly nonlinear problems as for the above test system and fornumerical simulations-both of which are a key part of Cyber-Cell. Thus,this test case demonstrates the feasibility of Cyber-Cell's approach.

[0397] Cyber-Cell is calibrated using its unique information theoryapproach. This allows the use of diverse proteomic, genomic,biochemical, and other data sets. This automated approach not onlyobtains the most probable values of the rate and other parameters, butalso automatically obtains an assessment of the associated uncertainty.The uncertainty assessment provides guidelines for experimental researchteams in the design of the most efficient data acquisition strategy.Cyber-Cell is calibrated using data distinct from the test data set. Thewealth of available data (see Table above) and the rapidly increasingproteomic, genomic, and other databases make this feasible.

XII. Summary

[0398] The two above-described embodiments illustrate the broadapplicability of the invention, spanning as they do a range of timecoordinates from nanoseconds to geologic eons and a range of spacecoordinates from the atomic to the continental. In view of the manypossible embodiments to which the principles of this invention may beapplied, it should be recognized that these embodiments are meant to beillustrative only and should not be taken as limiting the scope of theinvention. Therefore, the invention as described herein contemplates allsuch embodiments as may come within the scope of the following claimsand equivalents thereof.

We claim:
 1. A method for producing a model of a region of interest, themethod comprising: collecting a first set of data points pertaining tothe region of interest; dividing the first data set into a second dataset and a third data set; populating a model with data points from thesecond data set; interpolating a data point in the model using a subsetof data points from the second data set; comparing a subset of datapoints in the model to a subset of data points in the third data set;and if comparing yields a discrepancy larger than an error limit, thenvarying a data point in the model corresponding to a data point in thesecond data set and repeating the interpolating and comparing.
 2. Themethod of claim 1 wherein collecting comprises collecting data points ofmore than one type.
 3. The method of claim 1 wherein dividing producesdata points common to the second and third data sets.
 4. The method ofclaim 1 wherein interpolating comprises applying multi-dimensional,finite-element methods to a subset of data points in the model.
 5. Themethod of claim 1 wherein interpolating comprises applying an equationto a subset of data points in the model, the equation being of a type inthe set: thermodynamic, chemical, genomic.
 6. The method of claim 1wherein interpolating is based, at least in part, on a spacing among asubset of data points in the model.
 7. The method of claim 1 whereincomparing yields a discrepancy based, at least in part, on a sum ofsquares of differences between a subset of data points in the model anda subset of data points in the third data set.
 8. The method of claim 1wherein varying a data point comprises varying an item in the set: avalue of the data point, a position of the data point in the model. 9.The method of claim 1 wherein varying comprising varying multiple datapoints in the model corresponding to data points in the second data set.10. The method of claim 1 wherein collecting comprises associatingmeasures of constraint with data points in the second data set andwherein varying comprises choosing a data point in the model to vary,the chosen data point's measure of constraint being less than that ofanother data point in the model.
 11. The method of claim 10 wherein ameasure of constraint is associated with a probable error range andwherein a larger error range yields a lower constraint.
 12. The methodof claim 10 wherein interpolating is based, at least in part, onmeasures of constraint of a subset of data points in the model.
 13. Themethod of claim 1 further comprising: estimating a probability of themodel resulting from the varying; wherein the varying comprises choosingan amount by which to vary a data point, the data point and the amountto vary the data point chosen, at least in part, in order to maximize anestimated probability of the model.
 14. The method of claim 13 whereinestimating a probability is subject to a constraint based on a subset ofdata points in the third data set.
 15. The method of claim 13 whereinestimating a probability comprises: calculating a probability functionalthat maximizes an entropy, the calculating subject to normalizing theprobability functional and subject to a constraint based on a subset ofdata points in the third data set.
 16. The method of claim 15 whereinentropy is defined to comprise a negative of a functional integral overpossible states of the model of the probability functional multiplied bya natural log of the probability functional.
 17. The method of claim 15wherein normalizing the probability functional comprises setting afunctional integral over possible states of the model of the probabilityfunctional to one.
 18. The method of claim 15 wherein the calculating issubject to a constraint based on a subset of data points in the thirddata set when a functional integral over possible states of the model ofthe discrepancy multiplied by the probability functional is equal to anensemble error average.
 19. The method of claim 15 wherein maximizing anestimated probability of the model comprises: determining where afunctional derivative of the probability functional with respect to themodel becomes zero.
 20. A computer-readable medium having instructionsfor performing the method of claim
 1. 21. A method of extending a modelof a region of interest along a coordinate, the method comprising:applying an equation to evolve the model a distance along thecoordinate; and maximizing a probable state of the evolved model. 22.The method of claim 21 wherein maximizing a probable state comprises:collecting a set of data points pertaining to the region of interest;comparing a subset of data points in the model to a subset of datapoints in the collected data set; and if comparing yields a discrepancylarger than an error limit, then varying a data point in the model andrepeating the comparing.
 23. The method of claim 22 wherein comparingcomprises comparing data points of more than one type.
 24. The method ofclaim 22 wherein comparing yields a discrepancy based, at least in part,on a sum of squares of differences between a subset of data points inthe model and a subset of data points in the collected data set.
 25. Themethod of claim 22 wherein varying a data point comprises varying anitem in the set: a value of the data point, a position of the data pointin the model.
 26. The method of claim 22 wherein varying comprisingvarying multiple data points in the model.
 27. The method of claim 22further comprising: estimating a probability of the model resulting fromthe varying; wherein the varying comprises choosing an amount by whichto vary a data point, the data point and the amount to vary the datapoint chosen, at least in part, to maximize an estimated probability ofthe model.
 28. The method of claim 27 wherein estimating a probabilityis subject to a constraint based on a subset of data points in thecollected data set.
 29. The method of claim 27 wherein estimating aprobability comprises: calculating a probability functional thatmaximizes an entropy, the calculating subject to normalizing theprobability functional and subject to a constraint based on a subset ofdata points in the collected data set.
 30. The method of claim 29wherein entropy is defined to comprise a negative of a functionalintegral over possible states of the model of the probability functionalmultiplied by a natural log of the probability functional.
 31. Themethod of claim 29 wherein normalizing the probability functionalcomprises setting a functional integral over possible states of themodel of the probability functional to one.
 32. The method of claim 29wherein the calculating is subject to a constraint based on a subset ofdata points in the collected data set when a functional integral overpossible states of the model of the discrepancy multiplied by theprobability functional is equal to an ensemble error average.
 33. Themethod of claim 29 wherein maximizing an estimated probability of themodel comprises: determining where a functional derivative of theprobability functional with respect to the model becomes zero.
 34. Themethod of claim 21 wherein the model is evolved along a coordinate inthe set: time, space.
 35. The method of claim 21 wherein applying anequation comprises applying an equation to a subset of data points inthe model, the equation being of a type in the set: thermodynamic,chemical, genomic.
 36. A computer-readable medium having instructionsfor performing the method of claim
 21. 37. A method of estimating aprobability of a model of a region of interest, the method comprising:collecting a set of data points pertaining to the region of interest;comparing a subset of data points in the model to a subset of datapoints in the collected data set to yield a discrepancy; and calculatinga probability functional that maximizes an entropy, the calculatingsubject to normalizing the probability functional and subject to aconstraint based on a subset of data points in the collected data set.38. The method of claim 37 wherein comparing comprises comparing datapoints of more than one type.
 39. The method of claim 37 whereincomparing yields a discrepancy based, at least in part, on a sum ofsquares of differences between a subset of data points in the model anda subset of data points in the collected data set.
 40. The method ofclaim 37 wherein entropy is defined to comprise a negative of afunctional integral over possible states of the model of the probabilityfunctional multiplied by a natural log of the probability functional.41. The method of claim 37 wherein normalizing the probabilityfunctional comprises setting a functional integral over possible statesof the model of the probability functional to one.
 42. The method ofclaim 37 wherein the calculating is subject to a constraint based on asubset of data points in the collected data set when a functionalintegral over possible states of the model of the discrepancy multipliedby the probability functional is equal to an ensemble error average. 43.A computer-readable medium having instructions for performing the methodof claim
 37. 44. A method for producing a model of fracture locationsand fracture characteristics in a geologic basin, the method comprising:collecting a first set of data points pertaining to the geologic basin;dividing the first data set into a second data set and a third data set;populating a model with data points from the second data set; processinga subset of data points in the model by applying equations to simulaterock rheology by integrating continuous deformation with fracture,fault, gouge, and pressure solutions; processing a subset of data pointsin the model by applying equations to simulate mechanical processes tocoevolve deformation with multi-phase flow, petroleum generation,mineral reactions, and heat transfer; comparing a subset of data pointsin the model to a subset of data points in the third data set; and ifcomparing yields a discrepancy larger than an error limit, then varyinga data point in the model corresponding to a data point in the seconddata set and repeating the processing and comparing.
 45. The method ofclaim 44 wherein collecting a first set of data points comprisescollecting data in the set: well log data, surface data, core data,seismic data.
 46. A computer-readable medium having instructions forperforming the method of claim
 44. 47. A method for producing a model ofa biological cell, the method comprising: collecting a first set of datapoints pertaining to the biological cell; dividing the first data setinto a second data set and a third data set; populating a model withdata points from the second data set; processing a subset of data pointsin the model by applying equations to simulate reactions, the equationsbeing of types in the set: chemical kinetic, proteomic, genomic,glycolysis, citric acid cycle, amino acid synthesis, nucleotidesynthesis, membrane transport; comparing a subset of data points in themodel to a subset of data points in the third data set; and if comparingyields a discrepancy larger than an error limit, then varying a datapoint in the model corresponding to a data point in the second data setand repeating the processing and comparing.
 48. The method of claim 47wherein collecting a first set of data points comprises collecting datain the set: microscopy, genomics, proteomics, multi-dimensionalspectroscopy, x-ray crystallography, thermodynamics, biochemicalkinetics, bioelectrics.
 49. A computer-readable medium havinginstructions for performing the method of claim 47.